----------------------------------------------------------------
The LGRAM 0.7 format specification
by Mathieu Bouchard <matju@sympatico.ca>

	version 0.7.pre3 (2001.sep.16)
	version 0.7.pre2 (2001.sep.11)
	version 0.7.pre1 (2001.aug.29)
	version 0.5.1    (1999.apr.17)
	version 0.5.0    (1999.jan.03)

----------------------------------------------------------------
0. Introduction

0.1. Goals

To provide a generic way to...

	* serialize lists and simple data
	* express complex data by lists
	* express the shapes and constraints of data

The languages that LGRAM targets specially are:

	* Perl 5.4
	* Tcl
	* Ruby 1.6
	* Common LISP
	* ANSI C

Because those five languages are very different from each other, LGRAM
may also cover other languages implicitly.

The languages that LGRAM is actually mainly inspired from is a
different matter.

0.2. Features

	* trivial to parse
	* reflective
	* homoiconic
	* extensible
	* layered system where each level is useful

0.3. Layers and Levels

	* 1 : Character Layer
	* 2 : Structure Layer
	* 3 : Type System
	* 4 : Calling Across ObjectSpaces and Processes

----------------------------------------------------------------
1. Character Layer

this is the default encoding layer based on 7-bit ascii.

design rules of the character layer:

	* any character can be embedded inside a string
	* all characters of a string are significant (even all whitespace)
	* strings may be split over several lines
	* should look like Lisp for the most part
	* should use C-like and/or Perl-like conventions for the rest

1.1. Character Set

>	Whitespace = char( 0x09..0x0D, 0x20 )
>	Verbatim   = char( 0x20..0x21, 0x23..0x5B, 0x5D..0x7E )
>	HDigit     = char( 0x30..0x39, 0x41..0x46, 0x61..0x66 )
>	DDigit     = char( 0x30..0x39 )
>	EscQuote   = char( 0x22, 0x5C, 0x6E )

>	SymbolBody = char( 0x21, 0x24..0x27, 0x2A..0x5B,
>		0x5D..0x7A, 0x7C, 0x7E )
>	SymbolHead = char( 0x21, 0x24..0x27, 0x2A..0x2D, 0x2F, 0x3A..0x5A,
>		0x5D..0x7A, 0x7C, 0x7E )

>	FutureUse  = char( 0x23, 0x7B, 0x7D )

1.2. Escape Sequences for Strings

Some strings may be quite long, and thus you may wish to split them on
several lines by embedding end-of-lines and indentations within them.
Those are removed when reading the string.

>	EscSpace = /#{Whitespace}+\\/

Newlines are embedded in strings are escaped using the letter n, as
"\n". Double-quotes and backslashes are escaped directly, as "\"", "\\".

Double-quotes is the reserved character for enclosing characters of
a string. Backslash is the reserved character for all
special-purpose ("escape") sequences.

A way to embed any character (including the above mentioned). This can
be any number of hex digits up to 5, but where is a shortcut for
exactly two hex digits.

>	EscChar  = /x\{#{HDigit}{,5}\}/ | /x#{HDigit}{2}/

The set of all possible such sequences:

>	Esc       = EscSpace | EscQuote | EscChar

1.3. Literals

Symbols are unquoted strings for simple names that are not intended to
be looked at (unlike Strings) but rather are intended for tagging
information, such as naming types, constants, procedures, variables,
operators, enumeration elements, options, etc.

In LGRAM, Symbols are case-sensitive.

>	Symbol  = /(?![+-][\.0-9])#{SymbolHead}#{SymbolChar}*/

floats and integers. (note: those two rules overlap; so an order is
necessary; try floats first.)

>	Float   = /[+-]?[0-9]*\.[0-9]*[eE][+-]?[0-9]+)?/
>	Integer = /[+-]?[0-9]+/

strings.

>	String  = /"(#{Verbatim}|\\#{Esc})*"/

1.4. Lists

need not to be implemented as a chain of pairs: the parser may return
those as arrays or such; there is no support for improper lists nor
circular lists.

1.4.1. lexically

>	LParen  = /\(/
>	RParen  = /\)/

this is the end of the lexical level.
Its terminals are all the rules of sections 1.3 and 1.4.1.

1.4.2. grammatically

>	List    = LParen + any_number_of(Element) + RParen
>	Element = String | Integer | Float | Symbol | List

----------------------------------------------------------------
2. Structure Layer

This layer shows how lists of the first level may be intepreted as more
specialized structures, or rather, as constructors for these.

2.1. Constructors

Once parsed, a list is a sequence of N elements. If N>0 then the list
has a head (the starting element) and a tail (the whole list except the
head).

The head is a constructor specifier, and is used to determine which
constructor is going to be used to transform that list into a specific
kind of data. The chosen constructor gets passed the tail elements for
further processing.

The constructor specifier is normally a Symbol.

The empty list () will construct the same thing as (nil).

2.2. Predefined Constructors

2.2.1. Built-ins: Symbol, Integer, Float, String

>	Symbol
>	Integer
>	Float
>	String

Each of these constructors take one String argument that is unparsed;
therefore, in (String "\"foo\""), the string will be parsed twice, and
so the result has no quotes nor backslashes; and (Integer "4219") is
completely equivalent to the four-digit sequence 4219.

An exception is that the Symbol constructor has no restriction, and
therefore allows for otherwise unparseable symbols. If for some reason
you need :() to be a symbol, this is expressible as (Symbol ":()")

2.2.2. Containers

It is recommended that those map to structures that are easy to
manipulate rapidly.

>	Array

zero or more elements, indexed by their relative positions, starting at
zero.

>	Hash

zero or more unordered key/value pairs, where the "values" are indexed
by the "keys".

All keys should be in different "equivalence partitions", because
otherwise it means they clash. If a clashing Hash is encoded anyway as a
Hash, the issue may be resolved by taking only the last one, or any one.

2.2.3. Some Constants

>	(nil)
>	(false)
>	(true)

(nil) is distinct from (false) and from the empty array (Array).

however () is the same as (nil), as described above.

2.2.4. Other

>	Range

takes a lower bound and a upper bound argument, both Integers, and an
optional third argument "exclude_end" which indicates that the upper
bound is not included (if true; but the default is false)

2.3. User-defined Constructors

Is done by hooking into a table of constructors as provided by the
parsing tools.

----------------------------------------------------------------
3. Type System

To each constructor corresponds a type.

Other types may exist, that are combinations/specializations of other
types.

3.1. Types

A type is an object that represents a set of other objects for purposes
of reasoning about the system. Types may be interfaces, mixins,
classes, etc.

You normally use the Type constructor with one argument of Symbol type.
This will look up a Type in the Constructor table and the Type table.

When passed additional arguments, the type will be queried for a more
specialized version of itself, and therefore will act as a type
template.

3.2. Types for Predefined Constructors

>	(Type Symbol)
>	(Type Integer)
>	(Type Float)
>	(Type String)
>	(Type Array)
>	(Type Hash)
>	(Type nil)
>	(Type false)
>	(Type true)

3.3. Fundamental Type Templates

Type Any is a template type that is the union of two or more types.
Therefore if you want to express "either an Array or nil" you can
write (Type Any (Type Array) (Type nil))

Type All is a template type that is the intersection of two or more
types.

Type Choice is a template type this is the union of one or more
*values*, rather than types.

3.4. Fundamental Types

>	(Type Object)

The type that is the union of everything, including nil. (most other types
*don't* include nil, contrary to some other languages' types)

>	(Type Boolean)

is the same as (Type Any (Type true) (Type false))

>	(Type Type)

is the type of all types.

>	(Type Class)

Every Object is called an "instance of" that type.
This is a subset of type Type.

>	(Type Nothing)

has no elements, and therefore may describe certain impossible
structures.

3.5. Tuple

>	(Slot <symbol> <type or symbol>)

the first argument is a Symbol for the name of the slot.

the second argument is a Type for the constraint on the type of its
values, or alternatively a Symbol, which removes the need for wrapping
in a Type constructor.

>	(Rest <symbol> [<type or symbol> ...])

The first argument is a Symbol for the name of the slot.

The N other elements are a list of Types or Symbols (as for the second
argument of Slot) that represents a sequence of type constraints.

A Rest describes a sequence of M*N arguments where the N type
constraints are applied cyclicly. So (Rest blah Symbol Object)
describes a rest-list of an even number of elements, alternating
between Symbol and Object; (Rest blah Integer) is for any number of
Integers; and (Rest blah Object) is for no constraint at all.

>	(Type Tuple <slot_list> [rest <rest>])

is a template type for describing structures (arrays) where you have a
sequence of named elements. Remaining elements may be collected into a
single named Array.

The element list is an Array of Slots. If there are a fixed number of
elements in the described, there are no further elements in the
description. Else, there are two other arguments: the symbol "rest"
followed by a Rest.

Therefore, you may almost describe Tuple as a Tuple:

>	(Type Tuple
>	  (Array
>	    (Slot slot_list (Type Array)))
>	  (Rest options (Type Choice rest) Rest))

except that the rest of this array occurs at most once.
(this may suggest adding keyword arguments...)

----------------------------------------------------------------
4. Calling Across ObjectSpaces and Processes

4.1. Operation

>	(Operation <slot_list>
>	  [in <input_arguments>]
>	  [out <return_values>])

where <input_arguments> is of type (Type Type Tuple) and has a default
value of (Type Tuple (Array));

and <return_values> is of type (Type Any (Type Type Tuple) (Type nil))
and has a default value of nil.

when return_values is equal to nil, the call is asynchronous, so only
exceptions are sent back to the caller; otherwise, the call is
synchronous, which means value(s) are returned, even in the case (Type
Tuple (Array)), where an empty tuple is returned (which only signals the
end of the call)

4.2. Protocol (aka Interface)

>	(Protocol
>		[inherits_from (Array [<protocol> ...])]
>		[operations [[<name> <operation>] ...]]

4.3. Call

(write me)

4.4. Proxy

(write me)

----------------------------------------------------------------
5. Miscellaneous

5.1. Todo

5.1.1. soon

	* optional args
	* keyword args
	* constraints on arrays/hashes
	* special args: receiver, selector, block, inlet

5.1.2. later

	* sets?
	* format versioning
	* integer binary, hex (and maybe octal and/or arbitrary base).
	* preserving precision
	* complete rpc specs
	* arbitrary annotations
	* schemas, schema templates
	* quoting
	* allocation patterns and sharing semantics
	* non-tuple inputs
	* non-tuple outputs

5.1.3. one day (maybe)

	* fixed-point, bcd, ratios, complex, base64 strings and integers
	* date, time, timestamp (date+time+timezone/shifttime)
	* "binary"/raw data, separate from String
	* integrate TypeSpaces from LGram2.txt here.
	* inter-language implementation inheritance?
	* binary form of layer #1
	* multiple replies
	* backward references, cyclic lists, improper lists
	* grids (n-d arrays)
	* implementation of XML-style markup languages inside the LGRAM
		framework.

----------------------------------------------------------------
end of file

cvs-version-id:
$Id: LGram.txt,v 1.3 2001/09/18 00:25:19 matju Exp $
