Map tables, which may specify mappings to other database profiles, if desired.
Possibly, a set of rules describing the mapping of elements to a MARC representation.
Several of the entries above simply refer to other files, which describe the given objects.
The file may contain the following directives:
(m) The attribute set that is used for indexing and searching records belonging to this profile.
(o) The tag set (if any) that describe that fields of the records.
(o,r) Adds an element to the abstract record syntax of the schema. The path follows the syntax which is suggested by the Z39.50 document - that is, a sequence of tags separated by slashes (/). Each tag is given as a comma-separated pair of tag type and -value surrounded by parenthesis. The name is the name of the element, and the attributes specifies which attributes to use when indexing the element in a comma-separated list. A ! in place of the attribute name is equivalent to specifying an attribute name identical to the element name. A - in place of the attribute name specifies that no indexing is to take place for the given element. The attributes can be qualified with field types to specify which character set should govern the indexing procedure for that field. The same data element may be indexed into several different fields, using different character set definitions. See the the Section called Field Structure and Character Sets. The default field type is w for word.
Specifies indexing for record nodes given by xpath. Unlike directive elm, this directive allows you to index attribute contents. The xpath uses a syntax similar to XPath. The attributes have same syntax and meaning as directive elm, except that ! refers to the nodes selected by xpath.
This directive specifies character encoding for external records. For records such as XML that specifies encoding within the file via a header this directive is ignored. If neither this directive is given, nor an encoding is set within external records, ISO-8859-1 encoding is assumed.
If this directive is followed by enable, then extra indexing is performed to allow for XPath-like queries. If this directive is not specified - equivalent to disable - no extra XPath-indexing is performed.
Note: The mechanism for controlling indexing is not adequate for complex databases, and will probably be moved into a separate configuration table eventually.
The following is an excerpt from the abstract syntax file for the GILS profile.
name gils reference GILS-schema attset gils.att tagset gils.tag varset var1.var maptab gils-usmarc.map # Element set names esetname VARIANT gils-variant.est # for WAIS-compliance esetname B gils-b.est esetname G gils-g.est esetname F @ elm (1,10) rank - elm (1,12) url - elm (1,14) localControlNumber Local-number elm (1,16) dateOfLastModification Date/time-last-modified elm (2,1) title w:!,p:! elm (4,1) controlIdentifier Identifier-standard elm (2,6) abstract Abstract elm (4,51) purpose ! elm (4,52) originator - elm (4,53) accessConstraints ! elm (4,54) useConstraints ! elm (4,70) availability - elm (4,70)/(4,90) distributor - elm (4,70)/(4,90)/(2,7) distributorName ! elm (4,70)/(4,90)/(2,10 distributorOrganization ! elm (4,70)/(4,90)/(4,2) distributorStreetAddress ! elm (4,70)/(4,90)/(4,3) distributorCity ! |
This file type describes the Use elements of an attribute set. It contains the following directives.
This is an excerpt from the GILS attribute set definition. Notice how the file describing the bib-1 attribute set is referenced.
name gils reference GILS-attset include bib1.att att 2001 distributorName att 2002 indextermsControlled att 2003 purpose att 2004 accessConstraints att 2005 useConstraints |
The following is an excerpt from the TagsetG definition file.
name tagsetg reference TagsetG type 2 tag 1 title string tag 2 author string tag 3 publicationPlace string tag 4 publicationDate string tag 5 documentId string tag 6 abstract string tag 7 name string tag 8 date generalizedtime tag 9 bodyOfDisplay string tag 10 organization string |
These are the directives allowed in the file.
The following is an excerpt from the file describing the variant set Variant-1.
name variant-1 reference Variant-1 class 1 variantId type 1 variantId octetstring class 2 body type 1 iana string type 2 z39.50 string type 3 other string |
The directives available in the element set file are as follows:
The occurrences-specification can be either the string all, the string last, or an explicit value-range. The value-range is represented as an integer (the starting point), possibly followed by a plus (+) and a second integer (the number of elements, default being one).
The variant-request has the same syntax as the defaultVariantRequest above. Note that it may sometimes be useful to give an empty variant request, simply to disable the default for a specific set of fields (we aren't certain if this is proper Espec-1, but it works in this implementation).
The following is an example of an element specification belonging to the GILS profile.
simpleelement (1,10) simpleelement (1,12) simpleelement (2,1) simpleelement (1,14) simpleelement (4,1) simpleelement (4,52) |
These are the directives of the schema mapping file format:
This directive introduces a new search index code. The argument is a one-character code to be used in the .abs files to select this particular index type. An index, roughly, corresponds to a particular structure attribute during search. Refer to the Section called Search in Chapter 7.
This directive introduces a sort index. The argument is a one-character code to be used in the .abs fie to select this particular index type. The corresponding use attribute must be used in the sort request to refer to this particular sort index. The corresponding character map (see below) is used in the sort process.
This directive enables or disables complete field indexing. The value of the boolean should be 0 (disable) or 1. If completeness is enabled, the index entry will contain the complete contents of the field (up to a limit), with words (non-space characters) separated by single space characters (normalized to " " on display). When completeness is disabled, each word is indexed as a separate entry. Complete subfield indexing is most useful for fields which are typically browsed (eg. titles, authors, or subjects), or instances where a match on a complete subfield is essential (eg. exact title searching). For fields where completeness is disabled, the search engine will interpret a search containing space characters as a word proximity search.
This is the filename of the character map to be used for this index for field type.
The contents of the character map files are structured as follows:
This directive introduces the basic value set of the field type. The format is an ordered list (without spaces) of the characters which may occur in "words" of the given type. The order of the entries in the list determines the sort order of the index. In addition to single characters, the following combinations are legal:
Backslashes may be used to introduce three-digit octal, or two-digit hex representations of single characters (preceded by x). In addition, the combinations \\, \\r, \\n, \\t, \\s (space — remember that real space-characters may not occur in the value definition), and \\ are recognized, with their usual interpretation.
Curly braces {} may be used to enclose ranges of single characters (possibly using the escape convention described in the preceding point), eg. {a-z} to introduce the standard range of ASCII characters. Note that the interpretation of such a range depends on the concrete representation in your local, physical character set.
paranthesises () may be used to enclose multi-byte characters - eg. diacritics or special national combinations (eg. Spanish "ll"). When found in the input stream (or a search term), these characters are viewed and sorted as a single character, with a sorting value depending on the position of the group in the value statement.
This directive introduces the upper-case equivalencis to the value set (if any). The number and order of the entries in the list should be the same as in the lowercase directive.
This directive introduces the character which separate words in the input stream. Depending on the completeness mode of the field in question, these characters either terminate an index entry, or delimit individual "words" in the input stream. The order of the elements is not significant — otherwise the representation is the same as for the uppercase and lowercase directives.
This directive introduces a mapping between each of the members of the value-set on the left to the character on the right. The character on the right must occur in the value set (the lowercase directive) of the character set, but it may be a paranthesis-enclosed multi-octet character. This directive may be used to map diacritics to their base characters, or to map HTML-style character-representations to their natural form, etc.