  Zebra Server - Administrators's Guide and Reference
  Index Data, info@indexdata.dk
  $Revision: 1.47 $

  The Zebra server combines a versatile fielded/free-text index-
  ing/search engine with a Z39.50-1995 frontend to provide a powerful
  and flexible information mining tool. This document explains the pro-
  cedure for installing and configuring Zebra, and outlines the possi-
  bilities for managing data and providing Z39.50 services with the
  software. Zebra is a free version of the Index Data Z'mbol information
  system, and it excludes some functionality such as incremental
  database updating and support for large databases.
  ______________________________________________________________________

  Table of Contents



  1. Introduction
     1.1 Overview
     1.2 Features
     1.3 Future Work

  2. Compiling the software
     2.1 UNIX
     2.2 WIN32

  3. Quick Start
  4. Administrating Zebra
     4.1 Record Types
     4.2 The Zebra Configuration File
     4.3 Locating Records
     4.4 Indexing example

  5. Running the Maintenance Interface (zebraidx)
  6. The Z39.50 Server
     6.1 Running the Z39.50 Server (zebrasrv)
     6.2 Z39.50 Protocol Support and Behavior
        6.2.1 Initialization
        6.2.2 Search
           6.2.2.1 Regular expressions
           6.2.2.2 Query examples
        6.2.3 Present
        6.2.4 Scan
        6.2.5 Sort
        6.2.6 Close

  7. The Record Model
     7.1 Local Representation
        7.1.1 Canonical Input Format
           7.1.1.1 Record Root
           7.1.1.2 Variants
        7.1.2 Input Filters
     7.2 Internal Representation
        7.2.1 Tagged Elements
        7.2.2 Variants
        7.2.3 Data Elements
     7.3 Configuring Your Data Model
        7.3.1 About Object Identifers
        7.3.2 The Abstract Syntax
        7.3.3 The Configuration Files
        7.3.4 The Abstract Syntax (.abs) Files
        7.3.5 The Attribute Set (.att) Files
        7.3.6 The Tag Set (.tag) Files
        7.3.7 The Variant Set (.var) Files
        7.3.8 The Element Set (.est) Files
        7.3.9 The Schema Mapping (.map) Files
        7.3.10 The MARC (ISO2709) Representation (.mar) Files
        7.3.11 Field Structure and Character Sets
     7.4 Exchange Formats

  8. License
  9. About Index Data and the Zebra Server


  ______________________________________________________________________

  1.  Introduction

  1.1.  Overview

  Zebra is a fielded free-text indexing and retrieval engine with a
  Z39.50 frontend. You can use any commercial or freeware Z39.50 client
  to access data stored in Zebra.
  Zebra server can be used at the core of a Z39.50-based information
  retrieval framework. We're making the server available now to allow
  researchers and small organisations to share their information in the
  best possible way. We believe that Z39.50 currently represents one of
  the best ways of sharing information with others, and we would like to
  encourage as many people as possible to do so.  This document is a
  guide to using Zebra. It will tell you how to compile the software,
  and how to prepare your first database.  It also explains how the
  server can be configured to give you the functionality that you need.

  If you find the software interesting, you should join the support
  mailing-list by sending email to zebra-request@indexdata.dk.

  If you are interested in running a commercial service, if you wish to
  run large databases, or if you wish to make incremental updates to
  your databases even while users are accessing your system, then you
  might be interested in the Z'mbol Information Server which is
  available from Index Data or Fretwell-Downing Informatics. Z'mbol is a
  complete and supported package which offers many exciting
  possibilities that we have not been able to fit into this package.


  1.2.  Features

  This is a list of some of the most important features of the system.


  o  Supports arbitrarily complex records - base input format is an XML-
     like syntax which allows nested (structured) data elements, as well
     as variant forms of data.

  o  Supports random storage formats. A system of input filters driven
     by regular expressions allows you to easily process most ASCII-
     based data formats. SGML/XML, ISO2709 (MARC), and raw text are also
     supported.

  o  Supports boolean queries as well as relevance-ranking (free-text)
     searching. Right truncation and masking in terms are supported, as
     well as full regular expressions.

  o  Supports multiple concrete syntaxes for record exchange (depending
     on the configuration): GRS-1, SUTRS, ISO2709 (*MARC), XML. Records
     can be mapped between record syntaxes and schema on the fly.

  o  Supports approximate matching in registers (ie. spelling mistakes,
     etc).

  o  Supports a subset of the Z39.50 Explain Facility. Zebra's Explain
     database is automatically updated when a set of records is loaded
     into Zebra.


  Protocol support:


  o  Protocol facilities: Init, Search, Retrieve, Browse, Sort, Close,
     and Explain.

  o  Piggy-backed presents are honored in the search-request.

  o  Named result sets are supported.

  o  Easily configured to support different application profiles, with
     tables for attribute sets, tag sets, and abstract syntaxes.
     Additional tables control facilities such as element mappings to
     different schema (eg., GILS-to-USMARC).
  o  Complex composition specifications using Espec-1 are partially
     supported (simple element requests only).

  o  Element Set Names are defined using the Espec-1 capability of the
     system, and are given in configuration files as simple element
     requests (and possibly variant requests).

  o  Zebra runs on most Unix-like systems as well as Windows NT - a
     binary distribution for Windows NT is forthcoming - so far, the
     installation requires Microsoft Visual C++ to compile the system
     (we use version 6.0).


  1.3.  Future Work


  These are some of the plans that we have for the software in the near
  and far future, approximately ordered after their relative importance.
  Items marked with an asterisk will be implemented before the last beta
  release.


  o  *Complete the support for variants.

  o  *Finalize the data element include facility to support multimedia
     data elements in records.

  o  Add more sophisticated relevance ranking mechanisms. Add support
     for soundex and stemming. Add relevance feedback support.

  o  Complete EXPLAIN support.

  o  We want to add a management system that allows you to control your
     databases and configuration tables from a graphical interface.
     We'll probably use Tcl/Tk to stay platform-independent.

  Programmers thrive on user feedback. If you are interested in a
  facility that you don't see mentioned here, or if there's something
  you think we could do better, please drop us a mail. If you think it's
  all really neat, you're welcome to drop us a line saying that, too.
  You'll find contact info at the end of this file.


  2.  Compiling the software

  You need the YAZ package in order to compile this software. We suggest
  you unpack YAZ in the same directory as Zebra. Running ./configure
  (UNIX Only) and running make (nmake on WIN32) is in usully what it
  takes to compile YAZ.


  2.1.  UNIX

  An ANSI C compiler is required to compile the Zebra server system --
  gcc works very well if your own system doesn't provide an adequate
  compiler.

  Unpack the distribution archive. The configure shell script attempts
  to guess correct values for various system-dependent variables used
  during compilation. It uses those values to create a 'Makefile' in
  each directory of Zebra.

  To run the configure script type:



    ./configure



  The configure script attempts to use the C compiler specified by the
  CC environment variable. If not set, GNU C will be used if it is
  available. The CFLAGS environment variable holds options to be passed
  to the C compiler. If you're using a Bourne-compatible shell you may
  pass something like this:


         CC=/opt/ccs/bin/cc CFLAGS=-O ./configure



  To customize Zebra the configure script accepts a set of options. The
  most important are

     --prefix path
        Specifies installation prefix. This is only needed if you run
        make install later to perform a "system" installation. The
        prefix is /usr/local if not specified.

     --with-tclconfig=DIR
        If Tcl is installed on the system you can tell configure in
        which directory Tcl's tclConfig.sh is stored. The tclConfig.sh
        include information about settings required to link with Tcl's
        libraries.  If you don't specify this option, configure will see
        if Tcl's shell tclsh is in your path and if it is, it will guess
        where the equivalent tclConfig.sh is located. If tclsh is not
        found in your path and this option is not given Zebra will not
        include Tcl support.

     --with-yazconfig=DIR
        This options allows you to specify the directory that contains
        YAZ's yaz-config.  This options is useful if you wish to compile
        Zebra with a specific version of YAZ. YAZ version 1.5 and later
        creates a script yaz-config that includes information on
        compiler settings needed to link with it.

  When configured build the software by typing:


         make



  As an option you may type make depend to create source file
  dependencies for the package. This is only needed, however, if you
  modify the source code later.

  If successful, two executables have been created in the sub-directory
  bin.

     zebrasrv
        The Z39.50 server and search engine.

     zebraidx
        The administrative tool for the search index.



  The next step is optional and is only needed if you wish to install
  zebra in system directories such as /usr/bin, /usr/lib, etc.

  To perform this step, type


         make install



  The executables will be installed in prefix/bin, and profile tables
  will be installed in prefix/lib/zebra/tab. Here prefix represents the
  prefix as specified -- default being /usr/local.


  2.2.  WIN32

  Zebra is shipped with "makefiles" for the NMAKE tool that comes with
  Visual C++.

  Start an MS-DOS prompt and switch the sub directory WIN where the file
  makefile is located. Customize the installation by editing the
  makefile file (for example by using wordpad).

  The following summarises the most important settings in that file.


     YAZDIR
        Specifies where YAZ is located.

     DEBUG
        If set to 1, the software is compiled with debugging libraries.
        If set to 0, the software is compiled with release (non-
        debugging) libraries.

     BZIP2
        A group of settings (BZIP2LIB,..)  that must be defined if BZIP2
        compression support is desired.

  When satisfied with the settings in the makefile type


       nmake



  If compilation was successful the executables zebraidx.exe and
  zebrasrv.exe are put in the sub directory BIN.


  3.  Quick Start

  In this section, we will test the system by indexing a small set of
  sample GILS records that are included with the software distribution.
  Go to the test/gils subdirectory of the distribution archive. There
  you will find a configuration file named zebra.cfg with the following
  contents:



  # Where the schema files, attribute files, etc. are located.
  profilePath: .:../../tab:../../../yaz/tab

  # Files that describe the attribute sets supported.
  attset: explain.att
  attset: bib1.att
  attset: gils.att



  Now, edit the file and set profilePath to the path of the YAZ profile
  tables (sub directory tab of the YAZ distribution archive).

  The 48 test records are located in the sub directory records.  To
  index these, type:


       $ ../../bin/zebraidx -t grs.sgml update records



  In the command above the option -t specified the record type -- in
  this case grs.sgml. The word update followed by a directory root
  updates all files below that directory node.

  If your indexing command was successful, you are now ready to fire up
  a server. To start a server on port 2100, type:


       $ ../../bin/zebrasrv tcp:@:2100



  The Zebra index that you have just created has a single database named
  Default. The database contains records structured according to the
  GILS profile, and the server will return records in either either XML,
  USMARC, GRS-1, or SUTRS depending on what your client asks for.

  To test the server, you can use any Z39.50 client (1992 or later). For
  instance, you can use the demo client that comes with YAZ: Just cd to
  the client subdirectory of the YAZ distribution and type:



       $ ./yaz-client tcp:localhost:2100



  When the client has connected, you can type:



       Z> find surficial
       Z> show 1



  The default retrieval syntax for the client is USMARC. To try other
  formats for the same record, try:


       Z>format sutrs
       Z>show 1
       Z>format grs-1
       Z>show 1
       Z>format xml
       Z>show 1
       Z>elements B
       Z>show 1



  NOTE: You may notice that more fields are returned when your client
  requests SUTRS or GRS-1 records. When retrieving GILS records, this is
  normal - not all of the GILS data elements have mappings in the USMARC
  record format.

  If you've made it this far, there's a good chance that you've got
  through the compilation OK.


  4.  Administrating Zebra


  To administrate Zebra, you run the zebraidx program. This program
  supports a number of options which are preceded by a minus, and a few
  commands (not preceded by minus).

  Both the Zebra administrative tool and the Z39.50 server share a set
  of index files and a global configuration file. The name of the
  configuration file defaults to zebra.cfg.  The configuration file
  includes specifications on how to index various kinds of records and
  where the other configuration files are located. zebrasrv and zebraidx
  must be run in the directory where the configuration file lives unless
  you indicate the location of the configuration file by option -c.


  4.1.  Record Types

  Indexing is a per-record process. Before a record is indexed search
  keys are extracted from whatever might be the layout the original
  record (sgml,html,text, etc..).  The Zebra system currently supports
  two fundamantal types of records: structured and simple text.  To
  specify a particular extraction process, use either the command line
  option -t or specify a recordType setting in the configuration file.


  4.2.  The Zebra Configuration File

  The Zebra configuration file, read by zebraidx and zebrasrv defaults
  to zebra.cfg unless specified by -c option.

  You can edit the configuration file with a normal text editor.
  Parameter names and values are seperated by colons in the file. Lines
  starting with a hash sign (#) are treated as comments.

  If you manage different sets of records that share common
  characteristics, you can organize the configuration settings for each
  type into "groups".  When zebraidx is run and you wish to address a
  given group you specify the group name with the -g option. In this
  case settings that have the group name as their prefix will be used by
  zebraidx. If no -g option is specified, the settings with no prefix
  are used.

  In the configuration file, the group name is placed before the option
  name itself, separated by a dot (.). For instance, to set the record
  type for group public to grs.sgml (the SGML-like format for structured
  records) you would write:



       public.recordType: grs.sgml



  To set the default value of the record type to text write:



       recordType: text



  The available configuration settings are summarized below. They will
  be explained further in the following sections.


     group.recordType[.name]
        Specifies how records with the file extension name should be
        handled by the indexer. This option may also be specified as a
        command line option (-t). Note that if you do not specify a
        name, the setting applies to all files. In general, the record
        type specifier consists of the elements (each element separated
        by dot), fundamental-type, file-read-type and arguments.
        Currently, two fundamental types exist, text and grs.

     group.recordId
        Specifies how the records are to be identified when updated. See
        section ``Locating Records''.

     group.database
        Specifies the Z39.50 database name.

     group.storeKeys
        Specifies whether key information should be saved for a given
        group of records. If you plan to update/delete this type of
        records later this should be specified as 1; otherwise it should
        be 0 (default), to save register space.

     group.storeData
        Specifies whether the records should be stored internally in the
        Zebra system files. If you want to maintain the raw records
        yourself, this option should be false (0). If you want Zebra to
        take care of the records for you, it should be true(1).

     lockDir
        Directory in which various lock files are stored.

     keyTmpDir
        Directory in which temporary files used during zebraidx' update
        phase are stored.

     setTmpDir
        Specifies the directory that the server uses for temporary
        result sets.  If not specified /tmp will be used.

     profilePath
        Specifies the location of profile specification files.


     attset
        Specifies the filename(s) of attribute set files for use in
        searching. At least the Bib-1 set should be loaded (bib1.att).
        The profilePath setting is used to look for the specified files.
        See section ``The Attribute Set Files''

     memMax
        Specifies size of internal memory to use for the zebraidx
        program. The amount is given in megabytes - default is 4 (4 MB).

  4.3.  Locating Records

  The default behaviour of the Zebra system is to reference the records
  from their original location, i.e. where they were found when you ran
  zebraidx. That is, when a client wishes to retrieve a record following
  a search operation, the files are accessed from the place where you
  originally put them - if you remove the files (without running
  zebraidx again, the client will receive a diagnostic message.

  If your input files are not permanent - for example if you retrieve
  your records from an outside source, or if they were temporarily
  mounted on a CD-ROM drive, you may want Zebra to make an internal copy
  of them. To do this, you specify 1 (true) in the storeData setting.
  When the Z39.50 server retrieves the records they will be read from
  the internal file structures of the system.


  4.4.  Indexing example

  Consider a system in which you have a group of text files called
  simple. That group of records should belong to a Z39.50 database
  called textbase. The following zebra.cfg file will suffice:



       profilePath: /usr/lib/yaz/tab:/usr/lib/zebra/tab
       attset: explain.att
       attset: bib1.att
       simple.recordType: text
       simple.database: textbase



  5.  Running the Maintenance Interface (zebraidx)

  The following is a complete reference to the command line interface to
  the zebraidx application.

  Syntax


       $ zebraidx [options] command [directory] ...



  Options

     -t type
        Update all files as type. Currently, the types supported are
        text and grs.subtype. If no subtype is provided for the GRS
        (General Record Structure) type, the canonical input format is
        assumed (see section ``Local Representation''). Generally, it is
        probably advisable to specify the record types in the zebra.cfg
        file (see section ``Record Types''), to avoid confusion at
        subsequent updates.


     -c config-file
        Read the configuration file config-file instead of zebra.cfg.


     -g group
        Update the files according to the group settings for group (see
        section ``The Zebra Configuration File'').


     -d database
        The records located should be associated with the database name
        database for access through the Z39.50 server.


     -m mbytes
        Use mbytes of megabytes before flushing keys to background
        storage. This setting affects performance when updating large
        databases.


     -s Show analysis of the indexing process. The maintenance program
        works in a read-only mode and doesn't change the state of the
        index. This options is very useful when you wish to test a new
        profile.


     -V Show Zebra version.


     -v level
        Set the log level to level. level should be one of none, debug,
        and all.


  Commands

     Update directory
        Update the register with the files contained in directory. If no
        directory is provided, a list of files is read from stdin. See
        section ``Administrating Zebra''.



  6.  The Z39.50 Server

  6.1.  Running the Z39.50 Server (zebrasrv)

  Syntax


       zebrasrv [options] [listener-address ...]



  Options

     -a APDU file
        Specify a file for dumping PDUs (for diagnostic purposes).  The
        special name "-" sends output to stderr.


     -c config-file
        Read configuration information from config-file. The default
        configuration is ./zebra.cfg.


     -S Don't fork on connection requests. This can be useful for
        symbolic-level debugging. The server can only accept a single
        connection in this mode.


     -l logfile
        Specify an output file for the diagnostic messages. The default
        is to write this information to stderr.


     -v log-level
        The log level. Use a comma-separated list of members of the set
        {fatal,debug,warn,log,all,none}.


     -u username
        Set user ID. Sets the real UID of the server process to that of
        the given username. It's useful if you aren't comfortable with
        having the server run as root, but you need to start it as such
        to bind a privileged port.


     -w working-directory
        Change working directory.


     -i Run under the Internet superserver, inetd. Make sure you use the
        logfile option -l in conjunction with this mode and specify the
        -l option before any other options.


     -t timeout
        Set the idle session timeout (default 60 minutes).


     -k kilobytes
        Set the (approximate) maximum size of present response messages.
        Default is 1024 Kb (1 Mb).

  A listener-address consists of a transport mode followed by a colon
  (:) followed by a listener address. The transport mode is either osi
  or tcp.

  For TCP, an address has the form



       hostname | IP-number [: portnumber]



  The port number defaults to 210 (standard Z39.50 port).

  The special hostname "@" is mapped to the address INADDR_ANY, which
  causes the server to listen on any local interface. To start the
  server listening on the registered port for Z39.50, and to drop root
  privileges once the port is bound, execute the server like this (from
  a root shell):


       zebrasrv -u daemon tcp:@



  You can replace daemon with another user, eg. your own account, or a
  dedicated IR server account.

  The default behavior for zebrasrv is to establish a single TCP/IP
  listener, for the Z39.50 protocol, on port 9999.


  6.2.  Z39.50 Protocol Support and Behavior

  6.2.1.  Initialization

  During initialization, the server will negotiate to version 3 of the
  Z39.50 protocol (unless the client specifies a lower version), and the
  option bits for Search, Present, Scan, NamedResultSets, and
  concurrentOperations will be set, if requested by the client. The
  maximum PDU size is negotiated down to a maximum of 1Mb by default.


  6.2.2.  Search

  The supported query type are 1 and 101. All operators are currently
  supported with the restriction that only proximity units of type
  "word" are supported for the proximity operator.  Queries can be
  arbitrarily complex.  Named result sets are supported, and result sets
  can be used as operands without limitations.  Searches may span
  multiple databases.

  The server has full support for piggy-backed present requests (see
  also the following section).

  Use attributes are interpreted according to the attribute sets which
  have been loaded in the zebra.cfg file, and are matched against
  specific fields as specified in the .abs file which describes the
  profile of the records which have been loaded. If no Use attribute is
  provided, a default of Bib-1 Any is assumed.

  If a Structure attribute of Phrase is used in conjunction with a
  Completeness attribute of Complete (Sub)field, the term is matched
  against the contents of the phrase (long word) register, if one exists
  for the given Use attribute.  A phrase register is created for those
  fields in the .abs file that contains a p-specifier.

  If Structure=Phrase is used in conjunction with Incomplete Field - the
  default value for Completeness, the search is directed against the
  normal word registers, but if the term contains multiple words, the
  term will only match if all of the words are found immediately
  adjacent, and in the given order.  The word search is performed on
  those fields that are indexed as type w in the .abs file.

  If the Structure attribute is Word List, Free-form Text, or Document
  Text, the term is treated as a natural-language, relevance-ranked
  query.  This search type uses the word register, i.e. those fields
  that are indexed as type w in the .abs file.

  If the Structure attribute is Numeric String the term is treated as an
  integer. The search is performed on those fields that are indexed as
  type n in the .abs file.

  If the Structure attribute is URx the term is treated as a URX (URL)
  entity. The search is performed on those fields that are indexed as
  type u in the .abs file.
  If the Structure attribute is Local Number the term is treated as
  native Zebra Record Identifier.

  If the Relation attribute is Equals (default), the term is matched in
  a normal fashion (modulo truncation and processing of individual
  words, if required). If Relation is Less Than, Less Than or Equal,
  Greater than, or Greater than or Equal, the term is assumed to be
  numerical, and a standard regular expression is constructed to match
  the given expression. If Relation is Relevance, the standard natural-
  language query processor is invoked.

  For the Truncation attribute, No Truncation is the default.  Left
  Truncation is not supported. Process # is supported, as is Regxp-1.
  Regxp-2 enables the fault-tolerant (fuzzy) search. As a default, a
  single error (deletion, insertion, replacement) is accepted when terms
  are matched against the register contents.


  6.2.2.1.  Regular expressions


  Each term in a query is interpreted as a regular expression if the
  truncation value is either Regxp-1 (102) or Regxp-2 (103).  Both query
  types follow the same syntax with the operands:

     x  Matches the character x.

     .  Matches any character.

     [..]
        Matches the set of characters specified; such as [abc] or [a-c].

  and the operators:

     x* Matches x zero or more times. Priority: high.

     x+ Matches x one or more times. Priority: high.

     x? Matches x once or twice. Priority: high.

     xy Matches x, then y. Priority: medium.

     x|y
        Matches either x or y. Priority: low.

  The order of evaluation may be changed by using parentheses.

  If the first character of the Regxp-2 query is a plus character (+) it
  marks the beginning of a section with non-standard specifiers. The
  next plus character marks the end of the section.  Currently Zebra
  only supports one specifier, the error tolerance, which consists one
  digit.

  Since the plus operator is normally a suffix operator the addition to
  the query syntax doesn't violate the syntax for standard regular
  expressions.


  6.2.2.2.  Query examples


  Phrase search for information retrieval in the title-register:

   @attr 1=4 "information retrieval"


  Ranked search for the same thing:

   @attr 1=4 @attr 2=102 "Information retrieval"



  Phrase search with a regular expression:

   @attr 1=4 @attr 5=102 "informat.* retrieval"



  Ranked search with a regular expression:

   @attr 1=4 @attr 5=102 @attr 2=102 "informat.* retrieval"



  In the GILS schema (gils.abs), the west-bounding-coordinate is indexed
  as type n, and is therefore searched by specifying structure=Numeric
  String.  To match all those records with west-bounding-coordinate
  greater than -114 we use the following query:

   @attr 4=109 @attr 2=5 @attr gils 1=2038 -114



  6.2.3.  Present

  The present facility is supported in a standard fashion. The requested
  record syntax is matched against the ones supported by the profile of
  each record retrieved. If no record syntax is given, SUTRS is the
  default. The requested element set name, again, is matched against any
  provided by the relevant record profiles.


  6.2.4.  Scan

  The attribute combinations provided with the TermListAndStartPoint are
  processed in the same way as operands in a query (see above).
  Currently, only the term and the globalOccurrences are returned with
  the TermInfo structure.


  6.2.5.  Sort

  Z39.50 specifies three diffent types of sort criterias.  Of these
  Zebra supports the attribute specification type in which case the use
  attribute specifies the "Sort register".  Sort registers are created
  for those fields that are of type "sort" in the default.idx file.  The
  corresponding character mapping file in default.idx specifies the
  ordinal of each character used in the actual sort.

  Z39.50 allows the client to specify sorting on one or more input
  result sets and one output result set.  Zebra supports sorting on one
  result set only which may or may not be the same as the output result
  set.


  6.2.6.  Close

  If a Close PDU is received, the server will respond with a Close PDU
  with reason=FINISHED, no matter which protocol version was negotiated
  during initialization. If the protocol version is 3 or more, the
  server will generate a Close PDU under certain circumstances,
  including a session timeout (60 minutes by default), and certain kinds
  of protocol errors. Once a Close PDU has been sent, the protocol
  association is considered broken, and the transport connection will be
  closed immediately upon receipt of further data, or following a short
  timeout.


  7.  The Record Model

  Zebra is designed to support a wide range of data management
  applications. The system can be configured to handle virtually any
  kind of structured data. Each record in the system is associated with
  a record schema which lends context to the data elements of the
  record. Any number of record schema can coexist in the system.
  Although it may be wise to use only a single schema within one
  database, the system poses no such restrictions.

  The record model described in this chapter applies to the fundamental,
  structured record type grs as introduced in section ``Record Types''.

  Records pass through three different states during processing in the
  system.


  o  When records are accessed by the system, they are represented in
     their local, or native format. This might be SGML or HTML files,
     News or Mail archives, MARC records. If the system doesn't already
     know how to read the type of data you need to store, you can set up
     an input filter by preparing conversion rules based on regular
     expressions and possibly augmented by a flexible scripting language
     (Tcl). The input filter produces as output an internal
     representation:

  o  When records are processed by the system, they are represented in a
     tree-structure, constructed by tagged data elements hanging off a
     root node. The tagged elements may contain data or yet more tagged
     elements in a recursive structure. The system performs various
     actions on this tree structure (indexing, element selection, schema
     mapping, etc.),

  o  Before transmitting records to the client, they are first converted
     from the internal structure to a form suitable for exchange over
     the network - according to the Z39.50 standard.


  7.1.  Local Representation

  As mentioned earlier, Zebra places few restrictions on the type of
  data that you can index and manage. Generally, whatever the form of
  the data, it is parsed by an input filter specific to that format, and
  turned into an internal structure that Zebra knows how to handle. This
  process takes place whenever the record is accessed - for indexing and
  retrieval.


  The RecordType parameter in the zebra.cfg file, or the -t option to
  the indexer tells Zebra how to process input records. Two basic types
  of processing are available - raw text and structured data. Raw text
  is just that, and it is selected by providing the argument text to
  Zebra. Structured records are all handled internally using the basic
  mechanisms described in the subsequent sections. Zebra can read
  structured records in many different formats.  How this is done is
  governed by additional parameters after the "grs" keyboard, separated
  by "." characters.


  Three basic subtypes to the grs type are currently available:


     grs.sgml
        This is the canonical input format -- described below. It is a
        simple SGML-like syntax.


     grs.regx.filter
        This enables a user-supplied input filter. The mechanisms of
        these filters are described below.


     grs.tcl.filter
        This enables a user-supplied input filter with Tcl rules (only
        availble if zebra is compiled with Tcl support).


     grs.marc.abstract syntax
        This allows Zebra to read records in the ISO2709 (MARC) encoding
        standard. In this case, the last paramemeter abstract syntax
        names the .abs file (see below) which describes the specific
        MARC structure of the input record as well as the indexing
        rules.


  7.1.1.  Canonical Input Format

  Although input data can take any form, it is sometimes useful to
  describe the record processing capabilities of the system in terms of
  a single, canonical input format that gives access to the full
  spectrum of structure and flexibility in the system. In Zebra, this
  canonical format is an "SGML-like" syntax.

  To use the canonical format specify grs.sgml as the record type,

  Consider a record describing an information resource (such a record is
  sometimes known as a locator record). It might contain a field
  describing the distributor of the information resource, which might in
  turn be partitioned into various fields providing details about the
  distributor, like this:



       <Distributor>
           <Name> USGS/WRD </Name>
           <Organization> USGS/WRD </Organization>
           <Street-Address>
               U.S. GEOLOGICAL SURVEY, 505 MARQUETTE, NW
           </Street-Address>
           <City> ALBUQUERQUE </City>
           <State> NM </State>
           <Zip-Code> 87102 </Zip-Code>
           <Country> USA </Country>
           <Telephone> (505) 766-5560 </Telephone>
       </Distributor>



  NOTE: The indentation used above is used to illustrate how Zebra
  interprets the markup. The indentation, in itself, has no significance
  to the parser for the canonical input format, which discards
  superfluous whitespace.


  The keywords surrounded by <...> are tags, while the sections of text
  in between are the data elements. A data element is characterized by
  its location in the tree that is made up by the nested elements. Each
  element is terminated by a closing tag - beginning with </, and
  containing the same symbolic tag-name as the corresponding opening
  tag. The general closing tag - <>/ - terminates the element started by
  the last opening tag. The structuring of elements is significant. The
  element Telephone, for instance, may be indexed and presented to the
  client differently, depending on whether it appears inside the
  Distributor element, or some other, structured data element such a
  Supplier element.


  7.1.1.1.  Record Root

  The first tag in a record describes the root node of the tree that
  makes up the total record. In the canonical input format, the root tag
  should contain the name of the schema that lends context to the
  elements of the record (see section ``Internal Representation''). The
  following is a GILS record that contains only a single element
  (strictly speaking, that makes it an illegal GILS record, since the
  GILS profile includes several mandatory elements - Zebra does not
  validate the contents of a record against the Z39.50 profile, however
  - it merely attempts to match up elements of a local representation
  with the given schema):



       <gils>
           <title>Zen and the Art of Motorcycle Maintenance</title>
       </gils>



  7.1.1.2.  Variants

  Zebra allows you to provide individual data elements in a number of
  variant forms. Examples of variant forms are textual data elements
  which might appear in different languages, and images which may appear
  in different formats or layouts. The variant system in Zebra is
  essentially a representation of the variant mechanism of Z39.50-1995.

  The following is an example of a title element which occurs in two
  different languages.



       <title>
         <var lang lang "eng">
           Zen and the Art of Motorcycle Maintenance</>
         <var lang lang "dan">
           Zen og Kunsten at Vedligeholde en Motorcykel</>
       </title>



  The syntax of the variant element is <var class type value>. The
  available values for the class and type fields are given by the
  variant set that is associated with the current schema (see section
  ``Variant Set File'').

  Variant elements are terminated by the general end-tag </>, by the
  variant end-tag </var>, by the appearance of another variant tag with
  the same class and value settings, or by the appearance of another,
  normal tag. In other words, the end-tags for the variants used in the
  example above could have been saved.

  Variant elements can be nested. The element



       <title>
         <var lang lang "eng"><var body iana "text/plain">
           Zen and the Art of Motorcycle Maintenance
       </title>



  Associates two variant components to the variant list for the title
  element.

  Given the nesting rules described above, we could write



       <title>
         <var body iana "text/plain>
           <var lang lang "eng">
             Zen and the Art of Motorcycle Maintenance
           <var lang lang "dan">
             Zen og Kunsten at Vedligeholde en Motorcykel
       </title>



  The title element above comes in two variants. Both have the IANA body
  type "text/plain", but one is in English, and the other in Danish. The
  client, using the element selection mechanism of Z39.50, can retrieve
  information about the available variant forms of data elements, or it
  can select specific variants based on the requirements of the end-
  user.


  7.1.2.  Input Filters

  In order to handle general input formats, Zebra allows the operator to
  define filters which read individual records in their native format
  and produce an internal representation that the system can work with.

  Input filters are ASCII files, generally with the suffix .flt.  The
  system looks for the files in the directories given in the profilePath
  setting in the zebra.cfg files. The record type for the filter is
  grs.regx.filter-filename (fundamental type grs, file read type regx,
  argument filter-filename).

  Generally, an input filter consists of a sequence of rules, where each
  rule consists of a sequence of expressions, followed by an action. The
  expressions are evaluated against the contents of the input record,
  and the actions normally contribute to the generation of an internal
  representation of the record.

  An expression can be either of the following:


     INIT
        The action associated with this expression is evaluated exactly
        once in the lifetime of the application, before any records are
        read. It can be used in conjunction with an action that
        initializes tables or other resources that are used in the
        processing of input records.


     BEGIN
        Matches the beginning of the record. It can be used to
        initialize variables, etc. Typically, the BEGIN rule is also
        used to establish the root node of the record.


     END
        Matches the end of the record - when all of the contents of the
        record has been processed.


     /pattern/
        Matches a string of characters from the input record.


     BODY
        This keyword may only be used between two patterns. It matches
        everything between (not including) those patterns.


     FINISH
        THe expression asssociated with this pattern is evaluated once,
        before the application terminates. It can be used to release
        system resources - typically ones allocated in the INIT step.


  An action is surrounded by curly braces ({...}), and consists of a
  sequence of statements. Statements may be separated by newlines or
  semicolons (;). Within actions, the strings that matched the
  expressions immediately preceding the action can be referred to as $0,
  $1, $2, etc.

  The available statements are:



     begin type [parameter ... ]
        Begin a new data element. The type is one of the following:

        record
           Begin a new record. The followingparameter should be the name
           of the schema that describes the structure of the record, eg.
           gils or wais (see below). The begin record call should
           precede any other use of the begin statement.


        element
           Begin a new tagged element. The parameter is the name of the
           tag. If the tag is not matched anywhere in the tagsets
           referenced by the current schema, it is treated as a local
           string tag.


        variant
           Begin a new node in a variant tree. The parameters are class
           type value.



     data
        Create a data element. The concatenated arguments make up the
        value of the data element. The option -text signals that the
        layout (whitespace) of the data should be retained for
        transmission. The option -element tag wraps the data up in the
        tag. The use of the -element option is equivalent to preceding
        the command with a begin element command, and following it with
        the end command.


     end [type]
        Close a tagged element. If no parameter is given, the last
        element on the stack is terminated. The first parameter, if any,
        is a type name, similar to the begin statement. For the element
        type, a tag name can be provided to terminate a specific tag.


  The following input filter reads a Usenet news file, producing a
  record in the WAIS schema. Note that the body of a news posting is
  separated from the list of headers by a blank line (or rather a
  sequence of two newline characters.



       BEGIN                { begin record wais }

       /^From:/ BODY /$/    { data -element name $1 }
       /^Subject:/ BODY /$/ { data -element title $1 }
       /^Date:/ BODY /$/    { data -element lastModified $1 }
       /\n\n/ BODY END      {
                               begin element bodyOfDisplay
                               begin variant body iana "text/plain"
                               data -text $1
                               end record
                            }



  If Zebra is compiled with support for Tcl (Tool Command Language)
  enabled, the statements described above are supplemented with a
  complete scripting environment, including control structures
  (conditional expressions and loop constructs), and powerful string
  manipulation mechanisms for modifying the elements of a record. Tcl is
  a popular scripting environment, with several tutorials available both
  online and in hardcopy.

  NOTE: Variant support is not currently available in the input filter,
  but will be included with one of the next releases.


  7.2.  Internal Representation

  When records are manipulated by the system, they're represented in a
  tree-structure, with data elements at the leaf nodes, and tags or
  variant components at the non-leaf nodes. The root-node identifies the
  schema that lends context to the tagging and structuring of the
  record. Imagine a simple record, consisting of a 'title' element and
  an 'author' element:



               TITLE     "Zen and the Art of Motorcycle Maintenance"
       ROOT
               AUTHOR    "Robert Pirsig"



  A slightly more complex record would have the author element consist
  of two elements, a surname and a first name:



               TITLE     "Zen and the Art of Motorcycle Maintenance"
       ROOT
                         FIRST-NAME "Robert"
               AUTHOR
                         SURNAME    "Pirsig"



  The root of the record will refer to the record schema that describes
  the structuring of this particular record. The schema defines the
  element tags (TITLE, FIRST-NAME, etc.) that may occur in the record,
  as well as the structuring (SURNAME should appear below AUTHOR, etc.).
  In addition, the schema establishes element set names that are used by
  the client to request a subset of the elements of a given record. The
  schema may also establish rules for converting the record to a
  different schema, by stating, for each element, a mapping to a
  different tag path.


  7.2.1.  Tagged Elements

  A data element is characterized by its tag, and its position in the
  structure of the record. For instance, while the tag "telephone
  number" may be used different places in a record, we may need to
  distinguish between these occurrences, both for searching and
  presentation purposes. For instance, while the phone numbers for the
  "customer" and the "service provider" are both representatives for the
  same type of resource (a telephone number), it is essential that they
  be kept separate. The record schema provides the structure of the
  record, and names each data element (defined by the sequence of tags -
  the tag path - by which the element can be reached from the root of
  the record).


  7.2.2.  Variants

  The children of a tag node may be either more tag nodes, a data node
  (possibly accompanied by tag nodes), or a tree of variant nodes. The
  children of  variant nodes are either more variant nodes or a data
  node (possibly accompanied by more variant nodes). Each leaf node,
  which is normally a data node, corresponds to a variant form of the
  tagged element identified by the tag which parents the variant tree.
  The following title element occurs in two different languages:



             VARIANT LANG=ENG  "War and Peace"
       TITLE
             VARIANT LANG=DAN  "Krig og Fred"



  Which of the two elements are transmitted to the client by the server
  depends on the specifications provided by the client, if any.

  In practice, each variant node is associated with a triple of class,
  type, value, corresponding to the variant mechanism of Z39.50.


  7.2.3.  Data Elements

  Data nodes have no children (they are always leaf nodes in the record
  tree).

  NOTE: Documentation needs extension here about types of nodes -
  numerical, textual, etc., plus the various types of inclusion notes.


  7.3.  Configuring Your Data Model

  The following sections describe the configuration files that govern
  the internal management of data records. The system searches for the
  files in the directories specified by the profilePath setting in the
  zebra.cfg file.


  7.3.1.  About Object Identifers

  When Object Identifiers (or OID's) need to be specified in the
  following a named OID reference or a raw OID reference may be used.
  For the named OID's refer to the source file util/oid.c from YAZ. The
  raw canonical OID's are specified in dot-notation (for example
  1.2.840.10003.3.1000.81.1).


  7.3.2.  The Abstract Syntax

  The abstract syntax definition (also known as an Abstract Record
  Structure, or ARS) is the focal point of the record schema
  description. For a given schema, the ABS file may state any or all of
  the following:


  o  The object identifier of the Z39.50 schema associated with the ARS,
     so that it can be referred to by the client.

  o  The attribute set (which can possibly be a compound of multiple
     sets) which applies in the profile. This is used when indexing and
     searching the records belonging to the given profile.

  o  The Tag set (again, this can consist of several different sets).
     This is used when reading the records from a file, to recognize the
     different tags, and when transmitting the record to the client -
     mapping the tags to their numerical representation, if they are
     known.

  o  The variant set which is used in the profile. This provides a
     vocabulary for specifying the forms of data that appear inside the
     records.

  o  Element set names, which are a shorthand way for the client to ask
     for a subset of the data elements contained in a record. Element
     set names, in the retrieval module, are mapped to element
     specifications, which contain information equivalent to the Espec-1
     syntax of Z39.50.

  o  Map tables, which may specify mappings to other database profiles,
     if desired.

  o  Possibly, a set of rules describing the mapping of elements to a
     MARC representation.

  o  A list of element descriptions (this is the actual ARS of the
     schema, in Z39.50 terms), which lists the ways in which the various
     tags can be used and organized hierarchically.
  Several of the entries above simply refer to other files, which
  describe the given objects.


  7.3.3.  The Configuration Files

  This section describes the syntax and use of the various tables which
  are used by the retrieval module.

  The number of different file types may appear daunting at first, but
  each type corresponds fairly clearly to a single aspect of the Z39.50
  retrieval facilities. Further, the average database administrator, who
  is simply reusing an existing profile for which tables already exist,
  shouldn't have to worry too much about the contents of these tables.

  Generally, the files are simple ASCII files, which can be maintained
  using any text editor. Blank lines, and lines beginning with a (#) are
  ignored. Any characters on a line followed by a (#) are also ignored.
  All other lines contain directives, which provide some setting or
  value to the system. Generally, settings are characterized by a single
  keyword, identifying the setting, followed by a number of parameters.
  Some settings are repeatable (r), while others may occur only once in
  a file. Some settings are optional (o), whicle others again are
  mandatory (m).


  7.3.4.  The Abstract Syntax (.abs) Files

  The name of this file type is slightly misleading in Z39.50 terms,
  since, apart from the actual abstract syntax of the profile, it also
  includes most of the other definitions that go into a database
  profile.

  When a record in the canonical, SGML-like format is read from a file
  or from the database, the first tag of the file should reference the
  profile that governs the layout of the record. If the first tag of the
  record is, say, <gils>, the system will look for the profile
  definition in the file gils.abs. Profile definitions are cached, so
  they only have to be read once during the lifespan of the current
  process.

  When writing your own input filters, the record-begin command
  introduces the profile, and should always be called first thing when
  introducing a new record.

  The file may contain the following directives:


     name symbolic-name
        (m) This provides a shorthand name or description for the
        profile. Mostly useful for diagnostic purposes.


     reference OID-name
        (m) The OID for the profile (name or dotted-numerical list).


     attset filename
        (m) The attribute set that is used for indexing and searching
        records belonging to this profile.


     tagset filename [type]
        (o) The tag set (if any) that describe that fields of the
        records. The type, which is optional, specifies the tag type. If
        not given, the type-specifier in the Tag Set files is used.
     varset filename
        (o) The variant set used in the profile.


     maptab filename
        (o,r) This points to a conversion table that might be used if
        the client asks for the record in a different schema from the
        native one.


     marc filename
        (o) Points to a file containing parameters for representing the
        record contents in the ISO2709 syntax. Read the description of
        the MARC representation facility below.


     esetname name filename
        (o,r) Associates the given element set name with an element
        selection file. If an (@) is given in place of the filename,
        this corresponds to a null mapping for the given element set
        name.


     any tags
        (o) This directive specifies a list of attributes which should
        be appended to the attribute list given for each element. The
        effect is to make every single element in the abstract syntax
        searchable by way of the given attributes. This directive
        provides an efficient way of supporting free-text searching
        across all elements. However, it does increase the size of the
        index significantly. The attributes can be qualified with a
        structure, as in the elm directive below.


     elm path name attributes
        (o,r) Adds an element to the abstract record syntax of the
        schema. The path follows the syntax which is suggested by the
        Z39.50 document - that is, a sequence of tags separated by
        slashes (/). Each tag is given as a comma-separated pair of tag
        type and -value surrounded by parenthesis.  The name is the name
        of the element, and the attributes specifies which attributes to
        use when indexing the element in a comma-separated list. A ! in
        place of the attribute name is equivalent to specifying an
        attribute name identical to the element name. A - in place of
        the attribute name specifies that no indexing is to take place
        for the given element. The attributes can be qualified with
        field types to specify which character set should govern the
        indexing procedure for that field. The same data element may be
        indexed into several different fields, using different character
        set definitions. See the section ``Field Structure and Character
        Sets''.  The default field type is "w" for word.

  The following is an excerpt from the abstract syntax file for the GILS
  profile.



  name gils
  reference GILS-schema
  attset gils.att
  tagset gils.tag
  varset var1.var

  maptab gils-usmarc.map

  # Element set names

  esetname VARIANT gils-variant.est  # for WAIS-compliance
  esetname B gils-b.est
  esetname G gils-g.est
  esetname F @

  elm (1,10)              rank                        -
  elm (1,12)              url                         -
  elm (1,14)              localControlNumber     Local-number
  elm (1,16)              dateOfLastModification Date/time-last-modified
  elm (2,1)               Title                       w:!,p:!
  elm (4,1)               controlIdentifier      Identifier-standard
  elm (2,6)               abstract               Abstract
  elm (4,51)              purpose                     !
  elm (4,52)              originator                  -
  elm (4,53)              accessConstraints           !
  elm (4,54)              useConstraints              !
  elm (4,70)              availability                -
  elm (4,70)/(4,90)       distributor                 -
  elm (4,70)/(4,90)/(2,7) distributorName             !
  elm (4,70)/(4,90)/(2,10 distributorOrganization     !
  elm (4,70)/(4,90)/(4,2) distributorStreetAddress    !
  elm (4,70)/(4,90)/(4,3) distributorCity             !



  7.3.5.  The Attribute Set (.att) Files

  This file type describes the Use elements of an attribute set.  It
  contains the following directives.



     name symbolic-name
        (m) This provides a shorthand name or description for the
        attribute set. Mostly useful for diagnostic purposes.


     reference OID-name
        (m) The reference name of the OID for the attribute set.


     include filename
        (o,r) This directive is used to include another attribute set as
        a part of the current one. This is used when a new attribute set
        is defined as an extension to another set. For instance, many
        new attribute sets are defined as extensions to the bib-1 set.
        This is an important feature of the retrieval system of Z39.50,
        as it ensures the highest possible level of interoperability, as
        those access points of your database which are derived from the
        external set (say, bib-1) can be used even by clients who are
        unaware of the new set.



     att att-value att-name [local-value]
        (o,r) This repeatable directive introduces a new attribute to
        the set. The attribute value is stored in the index (unless a
        local-value is given, in which case this is stored). The name is
        used to refer to the attribute from the abstract syntax.

  This is an excerpt from the GILS attribute set definition. Notice how
  the file describing the bib-1 attribute set is referenced.



       name gils
       reference GILS-attset
       include bib1.att

       att 2001                distributorName
       att 2002                indexTermsControlled
       att 2003                purpose
       att 2004                accessConstraints
       att 2005                useConstraints



  7.3.6.  The Tag Set (.tag) Files

  This file type defines the tagset of the profile, possibly by
  referencing other tag sets (most tag sets, for instance, will include
  tagsetG and tagsetM from the Z39.50 specification. The file may
  contain the following directives.


     name symbolic-name
        (m) This provides a shorthand name or description for the tag
        set. Mostly useful for diagnostic purposes.


     reference OID-name
        (o) The reference name of the OID for the tag set. The directive
        is optional, since not all tag sets are registered outside of
        their schema.


     type integer
        (m) The type number of the tagset within the schema profile
        (note: this specification really should belong to the .abs file.
        This will be fixed in a future release).


     include filename
        (o,r) This directive is used to include the definitions of other
        tag sets into the current one.


     tag number names type
        (o,r) Introduces a new tag to the set. The number is the tag
        number as used in the protocol (there is currently no mechanism
        for specifying string tags at this point, but this would be
        quick work to add). The names parameter is a list of names by
        which the tag should be recognized in the input file format. The
        names should be separated by slashes (/). The type is th
        recommended datatype of the tag. It should be one of the
        following:


        o  structured

        o  string

        o  numeric

        o  bool

        o  oid

        o  generalizedtime

        o  intunit

        o  int

        o  octetstring

        o  null

  The following is an excerpt from the TagsetG definition file.



       name tagsetg
       reference TagsetG
       type 2

       tag     1       title           string
       tag     2       author          string
       tag     3       publicationPlace string
       tag     4       publicationDate string
       tag     5       documentId      string
       tag     6       abstract        string
       tag     7       name            string
       tag     8       date            generalizedtime
       tag     9       bodyOfDisplay   string
       tag     10      organization    string



  7.3.7.  The Variant Set (.var) Files

  The variant set file is a straightforward representation of the
  variant set definitions associated with the protocol. At present, only
  the Variant-1 set is known.

  These are the directives allowed in the file.


     name symbolic-name
        (m) This provides a shorthand name or description for the
        variant set. Mostly useful for diagnostic purposes.


     reference OID-name
        (o) The reference name of the OID for the variant set, if one is
        required.


     class integer class-name
        (m,r) Introduces a new class to the variant set.


     type integer type-name datatype
        (m,r) Addes a new type to the current class (the one introduced
        by the most recent class directive). The type names belong to
        the same name space as the one used in the tag set definition
        file.

  The following is an excerpt from the file describing the variant set
  Variant-1.



       name variant-1
       reference Variant-1

       class 1 variantId

         type  1       variantId               octetstring

       class 2 body

         type  1       iana                    string
         type  2       z39.50                  string
         type  3       other                   string



  7.3.8.  The Element Set (.est) Files

  The element set specification files describe a selection of a subset
  of the elements of a database record. The element selection mechanism
  is equivalent to the one supplied by the Espec-1 syntax of the Z39.50
  specification. In fact, the internal representation of an element set
  specification is identical to the Espec-1 structure, and we'll refer
  you to the description of that structure for most of the detailed
  semantics of the directives below.

  NOTE: Not all of the Espec-1 functionality has been implemented yet.
  The fields that are mentioned below all work as expected, unless
  otherwise is noted.

  The directives available in the element set file are as follows:


     defaultVariantSetId OID-name
        (o) If variants are used in the following, this should provide
        the name of the variantset used (it's not currently possible to
        specify a different set in the individual variant request). In
        almost all cases (certainly all profiles known to us), the name
        Variant-1 should be given here.


     defaultVariantRequest variant-request
        (o) This directive provides a default variant request for use
        when the individual element requests (see below) do not contain
        a variant request. Variant requests consist of a blank-separated
        list of variant components. A variant compont is a comma-
        separated, parenthesized triple of variant class, type, and
        value (the two former values being represented as integers). The
        value can currently only be entered as a string (this will
        change to depend on the definition of the variant in question).
        The special value (@) is interpreted as a null value, however.



     simpleElement path ['variant' variant-request]
        (o,r) This corresponds to a simple element request in Espec-1.
        The path consists of a sequence of tag-selectors, where each of
        these can consist of either:


        o  A simple tag, consisting of a comma-separated type-value pair
           in parenthesis, possibly followed by a colon (:) followed by
           an occurrences-specification (see below). The tag-value can
           be a number or a string. If the first character is an
           apostrophe ('), this forces the value to be interpreted as a
           string, even if it appears to be numerical.

        o  A WildThing, represented as a question mark (?), possibly
           followed by a colon (:) followed by an occurrences
           specification (see below).

        o  A WildPath, represented as an asterisk (*). Note that the
           last element of the path should not be a wildPath (wildpaths
           don't work in this version).

        The occurrences-specification can be either the string all, the
        string last, or an explicit value-range. The value-range is
        represented as an integer (the starting point), possibly
        followed by a plus (+) and a second integer (the number of
        elements, default being one).

        The variant-request has the same syntax as the
        defaultVariantRequest above. Note that it may sometimes be
        useful to give an empty variant request, simply to disable the
        default for a specific set of fields (we aren't certain if this
        is proper Espec-1, but it works in this implementation).

  The following is an example of an element specification belonging to
  the GILS profile.



       simpleelement (1,10)
       simpleelement (1,12)
       simpleelement (2,1)
       simpleelement (1,14)
       simpleelement (4,1)
       simpleelement (4,52)



  7.3.9.  The Schema Mapping (.map) Files

  Sometimes, the client might want to receive a database record in a
  schema that differs from the native schema of the record. For
  instance, a client might only know how to process WAIS records, while
  the database record is represented in a more specific schema, such as
  GILS. In this module, a mapping of data to one of the MARC formats is
  also thought of as a schema mapping (mapping the elements of the
  record into fields consistent with the given MARC specification, prior
  to actually converting the data to the ISO2709). This use of the
  object identifier for USMARC as a schema identifier represents an
  overloading of the OID which might not be entirely proper. However, it
  represents the dual role of schema and record syntax which is assumed
  by the MARC family in Z39.50.

  NOTE: The schema-mapping functions are so far limited to a
  straightforward mapping of elements. This should be extended with
  mechanisms for conversions of the element contents, and conditional
  mappings of elements based on the record contents.

  These are the directives of the schema mapping file format:


     targetName name
        (m) A symbolic name for the target schema of the table. Useful
        mostly for diagnostic purposes.


     targetRef OID-name
        (m) An OID name for the target schema.  This is used, for
        instance, by a server receiving a request to present a record in
        a different schema from the native one.


     map element-name target-path
        (o,r) Adds an element mapping rule to the table.


  7.3.10.  The MARC (ISO2709) Representation (.mar) Files

  This file provides rules for representing a record in the ISO2709
  format. The rules pertain mostly to the values of the constant-length
  header of the record.

  NOTE: This will be described better. We're in the process of re-
  evaluating and most likely changing the way that MARC records are
  handled by the system.


  7.3.11.  Field Structure and Character Sets

  In order to provide a flexible approach to national character set
  handling, Zebra allows the administrator to configure the set up the
  system to handle any 8-bit character set -- including sets that
  require multi-octet diacritics or other multi-octet characters. The
  definition of a character set includes a specification of the
  permissible values, their sort order (this affects the display in the
  SCAN function), and relationships between upper- and lowercase
  characters. Finally, the definition includes the specification of
  space characters for the set.

  The operator can define different character sets for different fields,
  typical examples being standard text fields, numerical fields, and
  special-purpose fields such as WWW-style linkages (URx).

  The field types, and hence character sets, are associated with data
  elements by the .abs files (see above). The file default.idx provides
  the association between field type codes (as used in the .abs files)
  and the character map files (with the .chr suffix). The format of the
  .idx file is as follows


     index field type code
        This directive introduces a new search index code. The argument
        is a one-character code to be used in the .abs files to select
        this particular index type. An index, roughly, corresponds to a
        particular structure attribute during search. Refer to section
        ``Search''.


     sort field code type
        This directive introduces a sort index. The argument is a one-
        character code to be used in the .abs fie to select this
        particular index type. The corresponding use attribute must be
        used in the sort request to refer to this particular sort index.
        The corresponding character map (see below) is used in the sort
        process.


     completeness boolean
        This directive enables or disables complete field indexing. The
        value of the boolean should be 0 (disable) or 1. If completeness
        is enabled, the index entry will contain the complete contents
        of the field (up to a limit), with words (non-space characters)
        separated by single space characters (normalized to " " on
        display). When completeness is disabled, each word is indexed as
        a separate entry. Complete subfield indexing is most useful for
        fields which are typically browsed (eg.  titles, authors, or
        subjects), or instances where a match on a complete subfield is
        essential (eg. exact title searching). For fields where
        completeness is disabled, the search engine will interpret a
        search containing space characters as a word proximity search.


     charmap filename
        This is the filename of the character map to be used for this
        index for field type.

  The contents of the character map files are structured as follows:


     lowercase value-set
        This directive introduces the basic value set of the field type.
        The format is an ordered list (without spaces) of the characters
        which may occur in "words" of the given type. The order of the
        entries in the list determines the sort order of the index. In
        addition to single characters, the following combinations are
        legal:


        o  Backslashes may be used to introduce three-digit octal, or
           two-digit hex representations of single characters (preceded
           by x).  In addition, the combinations \\, \\r, \\n, \\t, \\s
           (space -- remember that real space-characters may ot occur in
           the value definition), and \\ are recognised, with their
           usual interpretation.

        o  Curly braces {} may be used to enclose ranges of single
           characters (possibly using the escape convention described in
           the preceding point), eg. {a-z} to entroduce the standard
           range of ASCII characters. Note that the interpretation of
           such a range depends on the concrete representation in your
           local, physical character set.

        o  Paranthesises () may be used to enclose multi-byte characters
           - eg. diacritics or special national combinations (eg.
           Spanish "ll"). When found in the input stream (or a search
           term), these characters are viewed and sorted as a single
           character, with a sorting value depending on the position of
           the group in the value statement.


     uppercase value-set
        This directive introduces the upper-case equivalencis to the
        value set (if any). The number and order of the entries in the
        list should be the same as in the lowercase directive.



     space value-set
        This directive introduces the character which separate words in
        the input stream. Depending on the completeness mode of the
        field in question, these characters either terminate an index
        entry, or delimit individual "words" in the input stream. The
        order of the elements is not significant -- otherwise the
        representation is the same as for the upercase and lowercase
        directives.


     map value-set target
        This directive introduces a mapping between each of the members
        of the value-set on the left to the character on the right. The
        character on the right must occur in the value set (the
        lowercase directive) of the character set, but it may be a
        paranthesis-enclosed multi-octet character. This directive may
        be used to map diacritics to their base characters, or to map
        HTML-style character-representations to their natural form, etc.


  7.4.  Exchange Formats

  Converting records from the internal structure to en exchange format
  is largely an automatic process. Currently, the following exchange
  formats are supported:


  o  GRS-1. The internal representation is based on GRS-1, so the
     conversion here is straightforward. The system will create applied
     variant and supported variant lists as required, if a record
     contains variant information.

  o  SUTRS. Again, the mapping is fairly straighforward. Indentation is
     used to show the hierarchical structure of the record. All "GRS"
     type records support both the GRS-1 and SUTRS representations.

  o  ISO2709-based formats (USMARC, etc.). Only records with a two-level
     structure (corresponding to fields and subfields) can be directly
     mapped to ISO2709. For records with a different structuring (eg.,
     GILS), the representation in a structure like USMARC involves a
     schema-mapping (see section ``Schema Mapping''), to an "implied"
     USMARC schema (implied, because there is no formal schema which
     specifies the use of the USMARC fields outside of ISO2709). The
     resultant, two-level record is then mapped directly from the
     internal representation to ISO2709. See the GILS schema definition
     files for a detailed example of this approach.

  o  Explain. This representation is only available for records
     belonging to the Explain schema.

  o  Summary.  This ASN-1 based structure is only available for records
     belonging to the Summary schema - or schema which provide a mapping
     to this schema (see the description of the schema mapping facility
     above).

  o  SOIF. Support for this syntax is experimental, and is currently
     keyed to a private Index Data OID (1.2.840.10003.5.1000.81.2). All
     abstract syntaxes can be mapped to the SOIF format, although nested
     elements are represented by concatenation of the tag names at each
     level.

  o  XML. The use of XML as a transfer syntax in Z39.50 is not yet
     widely established so the use of it here must be characterised as
     somewhat experimental. The tag-names used are taken from the tag-
     set in use, except for local string tags where the tag itself is
     passed through unchanged.
  8.  License

  Zebra Copyright (c) 1995-2000 Index Data ApS.

  All rights reserved.

  Use and redistribution in source or binary form, with or without
  modification, of any or all of this software and documentation is
  permitted, provided that the following Conditions 1 to 6 set out below
  are met.

  1. Unless prior specific written permission is obtained this copyright
  and permission notice appear with all copies of the software and its
  documentation. Notices of copyright or attribution which appear at the
  beginning of any file must remain unchanged.

  2. The names of Index Data or the individual authors may not be used
  to endorse or promote products derived from this software without
  specific prior written permission.

  3. Source code or binary versions of this software and its
  documentation may be used freely in not for profit applications
  limited to databases of 100,000 records maximum. Other applications -
  such as publishing over 100,000 records, providing for-pay services,
  distributing a product based in whole or in part on this software or
  its documentation, or generally distributing this software or its
  documentation under a different license require a commercial license
  from Index Data.

  4. The software may be installed and used for evaluation purposes in
  conjunction with such commercially licensed applications for a trial
  period no longer than 60 days.

  5. Unless a prior specific written agreement is obtained THIS SOFTWARE
  IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EXPRESS,
  IMPLIED, OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL
  INDEX DATA BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT OR
  CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING
  FROM LOSS OF USE, DATA OR PROFITS, WHETHER OR NOT ADVISED OF THE
  POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF
  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

  6. Commercial licenses and support agreements for Zebra and related
  Index Data products such as Z'bol (c) - and written agreements
  relating to these Conditions may be obtained only from Index Data or
  its appointed agents as follows:

  Index Data: www.indexdata.dk Fretwell-Downing Informatics:
  www.fdgroup.co.uk Fretwell-Downing Informatics USA: www.fdi.com


  9.  About Index Data and the Zebra Server

  Index Data is a consulting and software-development enterprise that
  specialises in information management and retrieval applications. Our
  interests and expertise span a broad range of related fields, and one
  of our primary, long-term objectives is the development of a powerful
  information management system with open network interfaces and
  hypermedia capabilities. Zebra is an important component in this
  strategy.

  We make this software available free of charge for not-for-profit
  purposes, as a service to the networking community, and to further the
  development and use of quality software for open network
  communication. We encourage your comments and questions if you have
  ideas, things you would like to  see in future versions, or things you
  would like to contribute.

  If you like this software, and would like to use all or part of it in
  a commercial product, or to provide a commercial database service,
  please contact us. The Z'mbol Information System represents the
  commercial variant of Zebra. It includes full support; additional
  functionality and performance-boosting features, and it has what we
  think is a very exciting development path.



       Index Data
       Ryesgade 3
       DK-2200 Copenhagen N



       Phone: +45 3536 3672
       Fax  : +45 3536 0449
       Email: info@indexdata.dk



  The Random House College Dictionary, 1975 edition offers this
  definition of the word "Zebra":

  Zebra, n., any of several horselike, African mammals of the genus
  Equus, having a characteristic pattern of black or dark-brown stripes
  on a whitish background.



