Grouch's type system
-------------------

Grouch's type system is a large, useful subset of Python's type system.
The major advantages of Grouch's type system are that it is explicit and
enforced.  Since Python types are implicit (determined at run-time) and
mostly unenforced, Grouch sits quite neatly on top of Python, bringing
order and structure to a potentially chaotic situation.

Grouch understands the following major classes of data types:

  * atomic types: anything with a distinct Python type object can
    be an atomic type in Grouch, but they're intended for types with
    a single, atomic value.  The built-in types int, string, and float
    are obvious candidates (and in fact these are present as atomic
    types by default in any Grouch schema, along with long and complex).
    You can use other built-in types (e.g. file, function) as atomic
    types, or any extension type.  For example, if you use the
    mx.DateTime module, you might add DateTime as an atomic type, so
    you can declare variables as being of type DateTime and have Grouch
    enforce that requirement.

    Examples:
      "string" denotes a string variable
      "int" denotes an integer variable
      "DateTime" denotes a DateTime variable; this only works if you
        have explicitly added an atomic type called "DateTime" to your
        schema

  * container types: Python's built-in list, dictionary, set, and
    tuple types.  (Classes that act like lists, dictionaries, and tuples
    are "instance-container" types, and I haven't yet decided what to do
    about the type-class unification in Python 2.2.)

    Grouch enforces fairly stringent rules for container types:
      - lists must be homogenous, i.e. all elements of the same
        type, and may be of any length
        Examples:
          "[string]" denotes a list of strings
          "[int|long]" denotes a list of either ints or longs
            (a union type; see below)
          "[any]" denotes a list of anything (ie., no enforcement)
            (see below for "any" types)

      - dictionaries must be separately homogenous: all keys must
        be of the same type, and all values must be of the same type.
        (Incidentally, Grouch knows nothing about which types are
        hashable and allowed to be dictionary keys; that's enforced by
        Python at run-time.)  The key type and value type are specified
        separately.
        
        Examples:
          "{ string : int }" denotes a dictionary mapping strings to ints
          "{string : int|long} denotes a dictionary mapping strings
            to either ints or longs
          "{long : [string]} denotes a dictionary mapping longs to
            lists of strings

      - tuples are hetergenous (mixed-type) but fixed in size, and each
        "slot" is fixed in type.

        Examples:
          "(int,)" denotes a tuple containing exactly one integer
          "(string, string)" denotes a pair of strings
          "([int|long], string, int)" denotes a triple:
            list of (int or long), string, int

        Tuple types have one exception to this rule: if a tuple type is
        "extended", then the rules change for its last slot: for
        example, the extended tuple type "(string, int*)" denotes a
        tuple with exactly one string followed by zero or more ints.
        The following are all valid values of this type:
          ("foo", 3)
          ("foo", 3, 1)
          ("foo", 2, 5, 1, 6, 2, 1, 4, 5, 1, 15, 6, 2, 5)
          ("foo",)
        
        This is mainly used for tuples that act like lists, eg. if you
        want a list of strings to be usable as a dictionary key, you
        code it as a tuple of strings instead (lists aren't hashable).
        This practice is incompatible with Grouch's basic tuple
        definition, so extended tuples are provided as an escape
        mechanism.

      - set must be homogenous, i.e. all elements of the same
        type.

        Examples:

          "{string}" denotes a set of strings
          "{int|long}" denotes a set of either ints or longs
            (a union type; see below)
          "{any}" denotes a set of anything (ie., no enforcement)
            (see below for "any" types)

    Note that "of the same type" refers to Grouch types, not Python
    types.  For example, if a variable is declared "[int|long]",
    each element is checked separately to make sure it is either
    an int or a long; [1, 2L, 3] is a valid value of the type
    "[int|long]".  (Again, union types are described below.)

  * instance types: used for class instances.  A class Foo defined in
    the module foo.bar has an associated instance type "foo.bar.Foo".
    Generally, it's not enough to say that a variable is of type
    "foo.bar.Foo"; you also want to specify the instance attributes of
    Foo (and their types!).  Each instance type has an associated class
    definition that stores this information.  This is where Grouch's real
    power shines through, because typically Python data is accessed via
    an instance of some class.  If your schema has a class definition
    for that "root class", and for the class of each object reachable
    from the root, Grouch will crawl your entire object graph, ensuring
    that every instance, every attribute of every instance, and every
    element of every container anywhere in that object graph is of the
    correct type.

    The essential ingredient of a class definition is its attribute
    list.  This is described below, in "Defining a class schema".

    Examples:
      "FooBar" denotes an instance of class FooBar defined in
        the main program
      "thing.Thing" denotes an instance of class Thing defined
        in module thing

  * instance-container types: Python classes often implement the
    semantics of lists, tuples, or dictionaries.  You don't want to give
    up type-checking every attribute of instances of such classes, but
    you also want to make sure that they conform to the strict
    type-checking rules Grouch applies to containers.  Hence,
    instance-container types marry the two.

    Examples:
      "UserList.UserList [string]"
        denotes an instance of the UserList class, defined in the
        UserList module, that acts like a list of strings
      "MyDict { string : int|long }"
        denotes an instance of the MyDict class that acts like a
        dictionary mapping strings to either ints or longs

  * union types: any set of Grouch types may be combined to form a
    union type.  A candidate value is tested against each sub-type of
    the union type, and only rejected if all of the sub-types reject it.

    Examples:
      "int | long" denotes a value that may be either an int or a long
      "string | [string] : (string, string)"
        denotes a value that may be either a string, a list of strings,
        or a pair (tuple) of strings

  * wildcard type: used for variables that can be of any value.
    There is only one wildcard type, spelled "any".

  * boolean type: used for boolean (true/false) values.  Strictly
    speaking, any Python value can be interepreted in a boolean way:
    eg. 0, 0L, 0.0, "", and None are all false values, while 42,
    3.14159, and "foo!" are all true.  Grouch restricts this drastically:
    the only allowed values for boolean variables are 0, 1, and None.

  * alias types: used to define shorthand names for commonly-used 
    types.  The most common use of this is to alias the bare name of a
    class to its fully-qualified name -- e.g. if class Thing is defined
    in module project.util, then "Thing" might be an alias for
    "project.util.Thing".  ("project.util.Thing" is the instance type,
    and "Thing" is an alias type that expands to that instance type.)

    Aliases are also useful if you have a particular union type used
    frequently; instead of always spelling out "int | float | long", you
    can define "number" as an alias for this union type.  (This also
    makes it easy to change your definition of "number" if someday you
    have to extend it to handle, say, complex or rational numbers.)


Type grammar
------------

[taken from the type_parser.py module]

type : NAME                     # atomic, alias, instance, boolean, any
     | container_type           # list, tuple, dictionary
     | NAME container_type      # instance-container type
     | union_type

container_type : list_type
               | tuple_type
               | dictionary_type
list_type      : "[" type "]"
tuple_type     : "(" (type ",")* type "*"? ","? ")"
set_type       : "{" type "}"
dictionary_type: "{" type ":" type "}"

union_type : type ("|" type)+

Tokens:
  NAME : [a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*


$Id: type-system.txt 22888 2003-10-27 14:45:59Z nascheme $
