		PyLTXML -- The LT-XML Python Interface			
			   Release 1.3, August 2002
		Richard Tobin, Henry S. Thompson and Chris Brew

Introduction
------------

This package interfaces our high-performance validating C API for XML
to Python.  It is known to work with Python 1.6 and later, but the
binary version of this release is specialised to Python 2.2.  It
requires the LT-XML version 1.2. Please report any difficulties or
bugs which you encounter and we will do our best to deal with them.

There is no documentation beyond this file: please refer to the LT XML
documentation for details of the C API and structures which are being
made available in Python by this package.  Many of PyLTXML functions
have the same name as LT XML functions: check their documentation for
details.

This distribution is governed by the GNU Public License: see the
accompanying Copyright and COPYING files for details.  In the case of
a binary distribution on its own, this means you may use PyLTXML
yourself for any purpose, but may not redistribute it in any form,
until and unless you obtain the source distribution as well, and
comply with the GPL with respect to redistribution.

See 00INSTALL for source installation instructions -- if you opened
one of the binary distributions far enough to be reading this, your
installation is almost certainly complete already.

The module PyLTXML defines several types, functions, constants and one
error.

The types are:

  FileType
  DoctypeType
  ElementTypeType
  ContentParticleType
  AttrDefnType
  BitType
  ItemType
  OOBType
  ERefType
  QueryType

There is no type corresponding to the C NSL_Data; the data field of an
Item is just a list of strings or unicode strings and Items.

The Python internal type objects for all the above types are exposed
as PyLTXML.xxxType, e.g. PyLTXML.FileType.

Values defined as enumerated types in the C version are represented as
strings in Python, and are listed below with the slots and functions
they apply to.

The slots (or "attributes" as python calls them) for each type are
tabulated below.  The value type is given in parentheses -- 'unicode'
stands for unicode string.

  File:
    doctype - (Doctype) the document type
    where - current input entity location, a four-tuple of
         (entityName (string), lineNum (int), charPos (int), url (entity url))
    seenValidityError - (integer) 0/1 depending on whether a validity error
          has not/has been seen so far, or 0 if not validating

  Doctype:
[   ddb  - (string) name of DDB file , obsolete for XML]
    encoding - (string) name of (input) encoding
    xencoding - (string) name of output encoding
    sdd - (string) standalone declaration ("yes", "no", "unspecified")
    elementTypes - (dictionary->ElementType) element type declarations 
      from DTD
    entities - (dictionary->unicode) internal or external
     general entity declarations; internal value is definition,
     external is SYSTEM id
    parameterEntities - (dictionary->unicode) similarly for parameter entities
    name - (unicode) DOCTYPE name, if any
    doctypeStatement - (unicode) The entire DOCTYPE string

  Bit:
    type  - one of "bad", "start", "end", "empty", "eof", "text",
                    "pi", "doctype", "comment"
    item  - (Item) available if type is "start" or "empty"
    body  - (unicode) available if type is "text", "pi", "doctype" or "comment"
    label - (unicode) available if type is "start", "empty", or "end"
    llabel - (unicode) ditto, local part of tag
    prefix - (unicode) ditto, prefix part of tag
    nsuri  - (string) ditto, namespace part of tag or None if unprefixed
    isCData - (boolean) available if type is "text"
    isERef - (boolean) available if type is "text"

  Item:
    type  - one of "inchoate", "non_empty", "empty", "free"
    label - (unicode) element's tag
    llabel - (unicode) local part of element's tag
    prefix - (unicode) prefix part of tag
    nsuri - (string) namespace URI part of element's tag, or None if not qualified
    nsdict - (dict -> string) Namespace declarations in force
    data  - (tuple or list of unicode and Items)
    parent - (Item) the Item in whose data this item is, or None

  ElementType
    name - (unicode)
    type - one of "MIXED", "ANY", "EMPTY", "ELEMENT"
    particle - (ContentParticle)
    attrDefns - (dictionary -> AttrDefn)

  ContentParticle
    type - one of "#PCDATA", "NAME", "SEQUENCE", "CHOICE"
    name - (unicode) only if type=="NAME"
    repetition - one of "?", "+", "*" or None
    children - (list of ContentParticle)

  AttrDefn
    name - (unicode)
    type - one of "CDATA", "NMTOKEN", "ENTITY", "IDREF", "NMTOKENS",
                   "ENTITIES",, "IDREFS",, "ID",, "NOTATION", "ENUMERATION"
    defType - one of "#REQUIRED", "#IMPLIED", "NONE", "#FIXED"
    defValue - (unicode)
    allowedValues - (list of unicode)

  OOBType
    type - one of "comment", "pi", "cdata"
    data - (unicode)

  ERef
    name - (unicode)

  Query:
    (none)

All these slots are read-only, except Item.data

Bits and Items are freed when there is no reference to them, and there
is no way to free them explicitly.  It should therefore be impossible
to get an item of type "free".  This can be disabled viea
AutoFreeNSLObjects (see below).

The functions are listed below.  Optional arguments are in square
brackets; omitting them is equivalent to passing a value of None,
which has the same effect as passing NULL to the C function.  Only
non-obvious argument types are described.

  Open(filename, [doctype,] type) -> File
    type is one of NSL_read or'ed with zero or more of  NSL_read_all_bits,
		   NSL_read_no_consume_prolog, NSL_read_no_normalise_attributes,
		   NSL_read_declaration_warnings, NSL_read_strict
		   NSL_read_no_expand, NSL_read_validate,
                   NSL_read_namespaces, NSL_read_defaulted_attributes,
                   NSL_read_relaxed_any, NSL_read_allow_undeclared_nsattributes
              or   NSL_write or'ed with zero or more of  NSL_write_no_doctype,
		   NSL_write_no_expand, NSL_write_plain, NSL_write_fancy,
		   NSL_write_canonical, NSL_write_default, NSL_write_style

  OpenStream(filename, [doctype,] type, encoding)
  OpenURL(url, [doctype,] type, encoding)
    type as above
    encoding is the numerical value of an encoding taken from the
      dictionary CharacterEncodingNames

  FOpen(pfile, [doctype,] type) -> File
    pfile is a python file, eg sys.stdin
    type is as above

  OpenString(string, [doctype,] readType) -> File
     type is a read type as above

  DoctypeFromDdb(filename) -> Doctype [obsolete]

  ItemParse(file, item) -> Item

  GetNextBit(file) -> Bit or none
    Returns None at EOF so you should never see a bit of type "eof".

  Close(file) -> none

  Print(file, value) -> none
    value is a Bit, and Item or a string

  ForceNewline(file) -> none

  PrintStartTag(file,label) -> none
  PrintTextLiteral(file,string) -> none
  PrintEndTag(file, label) -> none
    Print start tag and attributes/text with ent refs as req'd/endtag
    label is a string.

  GetAttrStringVal(item, name) -> string
  GetAttrVal(item, name) -> string [no difference from GetAttrStringVal]
  PutAttrVal(item, name, value) -> integer (boolean)
  NewAttrVal(item, name, value) -> none
    name and value are unicode
    They are automatically "uniquified".
  ItemActualAttributes(item) -> list of attribute name,value pairs
  ItemActualAttributesNS(item) -> list of name,value,nsuri,localName tuples

  LookupPrefix(item, prefix) -> nsuri
     prefix is unicode

  ParseQuery(doctype, querystring) -> Query
  ParseQueryR(doctype, querystring) -> Query
    querystring is a string

  GetNextQueryItem(infile, query, [outfile]) -> Item or none

  RetrieveQueryItem(item, query, [fromitem]) -> Item or none
    fromitem should be None (or omitted) on the first call
  RetrieveQueryData(item,query) -> None -- not implemented yet

  Item(doctype, label, data)
    creates a new Item.
    data should be a list or None to create an empty Item; note that
     this is different from passing an empty list which creates a
     non-empty Item with no content.

  AutoFreeNSLObjects(boolean) -> None
   Turn garbage collection of NSLItems etc. on or off (default is on)

Various error conditions signal an error; the error object is
XMLinter.error, available as the value of PyLTXML.error

A list of all recognised character encodings is available as
PyLTXML.CharacterEncodingNames.

EXAMPLE USAGE

This distribution includes simple.py, a minimal example program, with
associated data file in the example sub-directory.

	Command:
		python simple.py < small.xml
	Output:
                ('unknown', 'UTF-8')
                [(u'ID', u'P12830'), (u'GSC', u'123.676139019108'), (u'K1', u'71'), (u'K2',u'9352'), (u'K3', u'10887'), (u'K4', u'7782277'), (u'TYPE', u'UNCODED')]

CONTACTS

See http://www.ltg.ed.ac.uk/software/xml/ to get LT XML, and for
documentation pointers for it.

This software was downloaded from
   ftp://www.ltg.ed.ac.uk/pub/LTXML/PyLTXML-1.3....

Send comments or questions to HThompson@ed.ac.uk
