README for flex.rb -- PotterSoftware Ruby Flex Regexp Matching Extension
by Szab Pter <pts@fazekas.hu>
this program is free software released under the GNU GPL (v2 or newer)
THIS SOFTWARE COMES WITH ABSOLUTELY NO WARRANTY! USE AT YOUR OWN RISK!

if you're wondering what flex.rb is, keep reading
the flex.rb project was started at Sat Dec  9 12:47:17 CET 2000
the README was started at Sun Dec 10 11:01:33 CET 2000
see $VERSION in the extconf.rb file
see this file for documentation, purpose, usage, requirements
see this file for installation instructions
see the comments in flexinit.c for more (techical) documentation
see the __END__ of this file for revision history

What's flex.rb?
~~~~~~~~~~~~~~~
flex.rb is (another) regexp matching library for the Ruby language. flex.rb
is more than 3 times faster than Ruby regexps, plus it supports matching
text arriving in multiple parts (e.g. via async, non-blocking I/O). flex.rb
embeds the GNU Flex 2.5.4 (fast lexical analyzer generator) as an engine.

flex.rb's key features:

-- fast: more than 3 times faster than Ruby regexps (the speed is achieved
   by always matching with DFAs -- Deterministic Finite Automatons)
-- incremental, aync, non-blocking: supports text arriving in multiple parts
-- multiple regexps can be matched simultaneously (matching stops when any
   of the regexps match -- the index of the matcher is returned)
-- 8-bit clean (I think Ruby regexps are 8-bit clean, too)
-- treats \n logically (Ruby has some glitches around ^, $, \A, \Z and \n)
-- the original flex sources and binaries are not needed to compile flex.rb
-- flexinit.c explains itself: it's easy to extend (-- or at least it was)
-- flex.rb is dynamic: regexps (regexp groups) can be compiled at run-time
   any number of times. Contrast: the flex program itself is static.

But flex.rb has a couple of drawbacks:

-- Flex regexp syntax differs from Ruby regexp syntax
-- no ()s for substring extraction
-- no \1 etc. for backreference
-- no regexp substitution (regsub)
-- no splitting
-- no multiple character sets (no EUC, SJIS etc.)
-- the flex.rb project is in beta status (with no known bugs)
-- flex.rb is not shipped with Ruby by default (I hope this will change.)
-- flex.rb is an extension, and requires a C compiler (some sort of GCC) to
   be installed
-- for troubleshooting, the user should understand how GNU Flex works
-- the interface is not complete yet (but definitely usable)
-- it's sometimes annyoing to type \\\\\\\\ to properly quote metachars

What's new?
~~~~~~~~~~~
For details, see revision history near the end of this document.

Between 0.07 -> 0.08, there were tons of improvements, functionality and
features increased to more than 3 or 4 times. Now flex.rb is able to be used
in real-life situations (explained later). The API is almost complete.

If you've tried to use 0.07, please forget it completely, and start reading
this README if it was a brand new product to you. (flex.rb is compatible
with 0.07, but emphasis has completely moved to the new features, and most
old functionality is considered obsolete.) Take special attention to the
Flex.most() method, which is new in 0.08 and is superior to `=~' and `go'.

When is flex.rb most useful?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think the most common usage of flex.rb is with async I/O. Let's imagine
that we write a webserver and we need to parse HTTP headers. We could use
traditional regexps to parse the header, and extract the needed information
-- as soon as it arrives. But how can we decide whether the full request has
already arrived? A regexp match sounds a good idea: /\A(GET|POST|HEAD)
\/(.*)/i etc.. But what if our regexp doesn't match? How can we determine
that there was no match, ...

1. ... because the full request hasn't arrived yet (e.g "GE" has arrived)
2. ... because it's a bad request (e.g "FOO" has arrived)

With traditional regexps this job gets messy. That's the point where flex.rb
comes in. The return value of a flex.rb matching can be the following:

0: no regexp has matched so far, more characters are needed to decide
1: no regexp has matched, and there cannot be a match even with more chars
2: regexp 2 has matched (regexps are numbered from 2)
3: regexp 3 has matched
4: ...

I think we can use flex.rb most productively in the following way: While
reading the data, we use a simplistic regexp with flex.rb (the .go method)
to determine whether everything has arrived. As soon as this happens, we use
a tricky, versatile, traditional Ruby regexp (with lot of ()s, \1s etc.) to
extract the required information.

Who is responsible for flex.rb?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
entirely Szab Pter <pts@fazekas.hu>

There is no homepage, no mailing list, no downloads, no mirrors, no CVS
repositories (yet).

You may be able to download at http://www.inf.bme.hu/~pts/flex_rb-latest.tar.gz

The project was in beta status at Sun Dec 17 17:08:36 CET 2000. In that time
there were such bugs that made usage impossible. There were no bug reports.

Installing flex.rb as root
~~~~~~~~~~~~~~~~~~~~~~~~~~
This wouldn't be surprising: it's just like installing any Ruby extension.

0. Obtain Ruby and flex.rb (e.g starting from a Freshmeat search)
1. Install Ruby >=1.6.1 (with `make install')
2. Make your $PATH contain ruby (with `export PATH="$PATH:..."')
2. Unpack the flex.rb distribution. Chdir to the newly created directory.
3. Run `ruby extconf.rb'
4. Run `make'
5. Run `./wrap_ruby test01.rb'
6. Run `make install' or `make site-install'
7. Run `ruby test01.rb'
8. Run `make clean; ruby test02.rb' to test it.

Installing flex.rb as non-root
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is no generic way. I'll show my preferred way.

1. Get and unpack Ruby >=1.6.1 to $HOME/src (or any other dir -- change the
   instructions appropriately)
2. In $HOME/src/ruby-1.?.* do `configure', `make'. (_no_ `make install')
3. Chdir to $HOME/src/ruby-1.?.*/ext.
4. Unpack the flex.rb distribution. The directory $HOME/src/ruby-1.?.*/ext/
   flex-?.?? will be created.
5. [Instead of typing 'ruby', always type './wrap_ruby'.]
6. Run `./wrap_ruby extconf.rb'
7. Run `make' (_no_ `make install')
8. Run `./wrap_ruby test01.rb' to test it.
 
I have some tips if you don't like my way:

-- give the correct -I option to `ruby' so it will find extconf.rb.
-- give option `-I$DIR_OF_flex.rb.so' to `ruby' on startup. E.g.

	ruby -I/tmp /tmp/test01.rb

Using flex.rb (tutorial)
~~~~~~~~~~~~~~~~~~~~~~~~
To use flex.rb's functionality in your Ruby scripts, you have to install
flex.rb 1st (see earlier in this document). A 'require "flex"' line is
required near the beginning the script. If Ruby complains 'No such file to
load...', then you haven't installed the product (flex.so and lib/*)
correctly.

Simple string matches
^^^^^^^^^^^^^^^^^^^^^
Almost all of flex.rb's funcionality is in `class Flex'. A Flex object is
created for each regexp, and strings are matched by the methods of the Flex
object. (This is similar to Ruby's internal regexps.) You may create any
number of Flex objects, any time in your scripts.

Let's try a simple match 1st:

	require "flex"
	f=Flex.new 'fo+bar';
	p( f =~ 'fobar' )		#: false !!
	p( f =~ 'fo+bar' )		#: false
	p( f =~ 'foobar' )		#: 2 (TRUE)
	p( f =~ 'fooobar' )		#: 2 (TRUE)
	p(      'fooobar' =~ f )	#: 2 (TRUE)
	p( f ==='fooobar' )             #: 2 (TRUE)
	p(      'fooobar'=== f )        #: false, incorrect syntax (*)
        p( f.matchh('fooobar') )        #: 2 (TRUE), matchh is not a typo!
	p( f =~ 'Gimme foobar!' )	#: false, not a full match
	f.free				#  optional
	# f =~ 'foo'			#: exception: f is freed

Note, that instead of `true', a Fixnum is returned, which indicates some
property of the actual match. It will be explained later. For now, you
should keep in mind that `=~' returns `false' if there was no match, and
some TRUE value if there was.

The names `=~', `===' and `matchh' are aliases for the same method. As you
can see, flex.rb regexps can be used very similar to Ruby regexps. And in
fact, flex.rb regexps are more than 3 times faster! But -- of course
-- there is a tradeoff between speed and functionality: internal Ruby
regexps are slower but more powerful, flex.rb regexps are faster.

There are some syntactic rules and differences you should know:

-- You should take attention to proper escaping (quoting) of specials.
   Special characters are: . [ ] \ * + ? { , } " ( | ) / ^ $ < >
   You (almost always) _must_ escape these with backslash (\) if you mean
   that character literally, and not its special meaning. The following
   chars are not (always) special in Ruby regexps, but special in flex.rb
   regexps: { } < > " /. Don't forget to escape them!

-- The subpattern variables $1, $2 etc. are not set, and there is no
   corresponding Match class. (However, it is possible to get back the
   _whole_ string matched; it will be explained later, near `yytext'.) This
   is intentional. Functionality was traded for performance.

-- Backreferences (\1, \2 etc.) cannot be used. Again, this is intentional.
   Functionality was traded for performance.

-- Line breaks can only be matched with `\n'. The (sometimes confusing, but
   useful) syntaxes ^, $, \A, \Z are not supported. This may change in the
   future.

-- `\b' is not supported. This may change in the future.

-- \s, \S, \w, \W, \d, \D (and similars) are not supported. This may change
   in the future.

-- flex.rb is 8-bit clean. The NUL character must be escaped as `\000' in
   the regexps. Strings matched may contain any number of NULs. Note,
   however that there is no Unicode, i18n or multiple charset support. All
   what you get is 8-bit clean bytes. This would probably never change. (But
   in fact, you can create your regexps -- with _very_ much effort -- to match
   UTF-8 or something similar.)

-- For complete explanation about what is supported, see the flex(1) manual
   page or the `flex' info page.

Compiled regexps may take very much memory (and, in the same time: very much
time to compile), but once the Regexp is compiled
(Flex.new is done), no extra memory is required. Ruby uses garbage
collection based memory reclaiming, so you're advised to call 'f.free' and
'GC.start' when you know you'll never use again the regexp, but you don't
have to.

Waiting for our string to arrive
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's try to match the 1st line of a HTTP request from STDIN. The
requirements are:

1. STDIN may be a slow network connection. The header may arrive in multiple
   chunkCs (parts).
2. We want to extract the HTTP header from as few chunks as possible.
3. If the input is not a HTTP header, we want to give an error message as
   soon as possible.

First try, without flex.rb:

	# examples/ex_rr.rb
	s=STDIN.gets
	if s=~/^(GET|HEAD|POST) (http:\/)?\/(.*)( HTTP\/.\.)?\r?\n$/;
	  print "OK: #{s.inspect}.\n"
	else
	  print "Bad header!\n"
	end

This seems to work fine, but requirement 3 is not satisfied. Suppose that
'FOO' arrives on STDIN, and then -- for a long time -- nothing else.
`STDIN.gets' never returns: it waits for a newline forever. Bad luck.

That's the point where flex.rb comes and solves the problem:

	# examples/ex_fr.rb
	require "flex"
	f=Flex.new '(GET|HEAD|POST) (http:\/)?\/(.*)( HTTP\/.\.)?\r?\n';
	def next_chunk(fi)
	  begin
	    return fi.sysread(4096)
	  rescue EOFError
	    return false
	  end
	end
        # MARK
	nil while (s=next_chunk STDIN) and 0==(ret=f.most s)
	raise SyntaxError, "premature EOF" unless s
	raise SyntaxError, "not a HTTP header" if ret==1
	print "OK, #{f.yyleng} bytes in header.\n"
	print "Extra bytes after header: #{f.ahead.inspect}.\n";

Try the speed difference with

	(echo -n 'GEX'; sleep 3) | ruby examples/ex_rr.rb	# 3s
	(echo -n 'GEX'; sleep 3) | ruby examples/ex_fr.rb	# immediately
	(echo -n 'GET'; sleep 2; echo ' /index.html';
         sleep 33333) | ruby examples/ex_fr.rb			# 2s

The example above should be easy understand. The next_chunk() function is
needed because IO.sysread() throws EOFError, which we dislike. All the
interesting things happen in the one-liner while loop:

	nil while (s=next_chunk STDIN) and 0==(ret=f.most s)

We read some bytes into `s', and call f.most(s). `f.most' is like `f =~',
but it doesn't need the whole string to be present: when it determines that
more data is needed, it returns `0'. Thus we test for `0' in the while loop,
and read again if we have to.

So far, so good.

But how can we extract the full header arrived? One possibility is the
concatenate the chunks by hand, and the other possibility is to ask `f' to
remember:

	# examples/ex_fr.rb
	# modified after MARK
	f.opts |= Flex::OPT_REMEMBER
	nil while (s=next_chunk STDIN) and 0==(ret=f.most s)
	raise SyntaxError, "premature EOF" unless s
	raise SyntaxError, "not a HTTP header" if ret==1
	print "OK: header is #{f.yytext.inspect}.\n"
	print "Extra bytes after header: #{f.ahead.inspect}.\n";
	f.free

OPT_REMEMBER is switched off by default due to performance reasons.

Note, that _we_ give the data to f.most, and it does not read STDIN
directly. This gives us _full_ control, because we have
the ability to do connection, I/O and timeout handling as
_we_ wish. Also note, that f.most never blocks: it processes the string
passed in the argument (or `false' on EOF!) very quickly, and returns
immediately with:

-- 0, if more data is needed to decide whether there is a match (and what
   kind of match it is -- see later)
-- 1 on mismatch, i.e if there is no match, and whatever comes next, it
   won't match.
-- >=2 on match: the number of the matching rule is returned (which is
   always 2 if we have only one regexp -- see later)

We can always use `f.yyleng' to get the length of the last match. Also
`f.ahead' is always available (even without OPT_REMEMBER): it returns the
characters arrived, but unmatched so far.

If we set OPT_REMEMBER (with `f.opts |= Flex::OPT_REMEMBER'), we can use
`f.yytext' to get the last matching string. OPT_REMEMBER may be set and
reset (`f.opts &= ~Flex::OPT_REMEMBER') any time.

Multiple regexps
^^^^^^^^^^^^^^^^
The constructor

	f=Flex.new('REGEXP');

is a special form

	f=Flex.new([ nil, nil, 'REGEXP2' ]);

of

	f=Flex.new([ 'FLEX-CMDLINE-OPTIONS', # may be nil
                     'FLEX-DECLARATIONS',    # e.g `%x ASCOND', may be nil
	             'REGEXP2',
	             'REGEXP3', ... ]);

FLEX-CMDLINE-OPTIONS are one-letter command line options from the flex(1)
man page. This is almost useless, since these options (mostly) don't have
any affect in flex.rb. Example: '-s' (which is switched on by default...).
You can use '-sW' to get a warning in the form of _one_ FlexWarning
exception.

FLEX-DECLARATIONS are bytes one usually writes before the 1st '%%' line in the
Flex .l grammar file. If you don't know what to write there, just write
`nil', and enjoy the carefully and usefully set up defaults. (Also, if you
need `start conditions' (see later), you must declare them here.)

'REGEXP2', 'REGEXP3' etc. are regexps, which are matched simultaneously (i.e
in the same time). `f.most' tries to match the longest possible string with
any of the regexps. `f.most' returns the number of the regexp actually
matched (>=2). Examples:

	require "flex"
	f=Flex.new([nil,nil,'foo*','foobar'])
	f.opts |= Flex::OPT_REMEMBER
	f.reset;
	p f.most("fo")				#: 0, cannot decide yet
	p f.most("o")				#: 0, not yet
	g=f.clone; p g.most("o")		#: 0, maybe more o's will arrive
	           p g.most(false), g.yytext	#: 2, "fooo", no that's all
	g=f.clone; p g.most("bar"), g.yytext	#: 3, "foobar"
	g=f.clone; p g.most("z"), g.yytext	#: 2, "foo"
	p g.ahead				#: "z", part of next match
	p f.most("b")				#: 0, cannot decide yet
	g=f.clone; p g.most("i"), g.yytext	#: 2, "foo"
	p g.ahead				#: "bi", part of next match
	g=f.clone; p g.most("ar"), g.yytext	#: 3, "foobar"
						#  starting a new match
	p g.most("fo")				#: 0, cannot decide yet
	p g.most(false), g.yytext		#: 2, "fo", we said EOF.

Matching multiple times
^^^^^^^^^^^^^^^^^^^^^^^
	require "flex"
	f=Flex.new( [nil,nil,'.*\n?'] );
	f.opts=Flex::OPT_REMEMBER
	def next_chunk(fi)
	  begin
	    return fi.sysread(4096)
	  rescue EOFError
	    return false
	  end
	end
	# f.reset
	while f.each_token(next_chunk(STDIN)) {|i|
	  # called with f.yytext=="" if there is no \n at EOF
	  print "#{f.yyleng} chars: #{f.yytext.inspect}\n"
	}; end

The example above reads STDIN line-by-line. This is not very interesting,
since we can do it easier, without flex.rb:

	STDIN.each_line("\n") { |s|
	  print "#{s.length} chars: #{s.inspect}\n"
	}; end

But flex.rb has the power that each_line() hasn't: you can specify what you
mean by `line' with an arbitrary regexp(!). \n does not need to be a
separator, and you can impose powerful syntax restrictions on the contents.

(In the same way you can read a text word by word or a program source code
token by token. That's particularly useful for writing syntax
highlighters.)

`f.each_token(s)' calls `f.most(s)', and then `f.most("")', `f.most("")'...
enough times. For each match and mismatch (1, >=2), the given block is
called with the return value of f.most(). We need the nested loop (`while'
and `each_token'), because a line can arrive in multiple chunks AND a single
chunk may contain multiple lines.

Start conditions
^^^^^^^^^^^^^^^^
This part of the tutorial is incomplete.

flex.rb supports start conditions, as documented in the flex(1) man page.
You can use `f.yystart', `f.yystart=...' and `f.begin(...)' to manipulate
start conditions. `f.reset' does not revert to INITIAL.

See `test01.rb' for examples.

Memory and CPU time requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I have terrible news. Compiled regexps may take very much -- exponential --
memory, but once the regexp is compiled (Flex.new is done), no extra memory
is required. Ruby uses garbage collection based memory reclaiming (which
recalaims memory some time later than it is first possible), so you're
advised to call 'f.free' and 'GC.start' when you know you'll never use that
regexp again.

I repeat that compiled regexps may take very much memory, and compiling
regexps may take very much time: even exponential to length of the regexp.
(e.g. 'Flex.new(".*b[ab]{0,15}")' is a killer. If you write $n$ instead of
15, memory and computation requirements are proportional to $2^n$). One
should have a good understanding of formal language theory (developed mainly
by Chomsky) to properly estimate requirements. The rules of thumb are:

-- .* and similars should be near the end of the regexp
-- counted iteration (the form `R{L,U}'), where $L>=2$ and $U<infinity$
   should be used with care

Some results for the worst case on my Intel Celeron 333 Mhz:

-- .*b[ab]{0,15}
    35,21s user 0,06s system 99% cpu  0:00:35 total
-- .*b[ab]{0,16}
   182,31s user 0,23s system 97% cpu  3:08,05 total (8.5 Mb of RAM needed)
-- .*b[ab]{0,17}
   822,75s user 0,63s system 98% cpu 13:54,32 total (17 Mb of RAM needed)
   gcc -O3 -c lex.yy.c  48,35s user 3,63s system 94% cpu 54,932 total
   -rw-r--r--   1 pts      guests    5513884 Dec 17 17:30 lex.yy.o

These terrible results are not flex.rb's fault, not Flex's fault and not due
to incompetence of the authors. Flex is a fast scanner generator, not a fast
generator of scanners. The mathematical background of Flex clearly shows
that innocently-looking regexps can impose serious hardware requirements to
compile. The method Flex uses guarantees fast scanners, with constant (maybe
very high -- but still indenependent of the strings matched) memory
requirements and linear speed (with the length of string matched). On the
other hand, Ruby regexps (and GNU regexps and Perl regexps) sometimes need
less memory, but matching a string may take exponential time. The underlying
algorithms are completely different, and it is a very though (and -- as I
know -- unsolved, open) problem to fulfill both small memory and CPU
requirements both while compiling and matching.

Thus flex.rb and and Flex itself would probably _never_ be improved in this
manner, or at least not until a brand new, superior algorithm is proposed.

The definitive API documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is not complete (yet). If you have questions after the tutorial, please
read `test01.rb', and see whether there is a test case that answers your
queston. Or write me an e-mail.

OK, let's start it...

Constructing a Flex object
^^^^^^^^^^^^^^^^^^^^^^^^^^
Use

	aflex=Flex.new([...,..,'REGEXP2','REGEXP3',...])	# or
	aflex=Flex.new('REGEXP2')

The Flex object (f) consists of:

-- the scanner tables (final, determined by the regexp you specified to
   Flex.new(), shared between all clones)
-- the options (default to 0, see later)
-- the last string matched (`aflex.yytext')
-- the scanning state (we are sometimes in the middle of the scanning,
   because the input can arrive in multiple parts)

You can clone the object with

	other=aflex.clone

This clones all the properties above.

You can reset the scanner state with

	aflex.reset

This resets the last string matched and the scanning state to the their
initial (Flex.new...) values. The options are not affected.

You can set several options at any time (see their meaning later):

	aflex.opts=0			# default
	aflex.opts=Flex::OPT_RESET	# reset to start position before scan
	aflex.opts=Flex::OPT_REMEMBER	# remember matched string in `aflex.yytext'
	aflex.opts=Flex::OPT_RESET|
	       Flex::OPT_REMEMBER	# both


[...]

Scanner methods
^^^^^^^^^^^^^^^
Once a Flex object (aflex) has been constructed (with the Flex.new(...), see
earlier), several methods of it can be called to scan/match strings against
the regexps specified in the constructor. These methods are called scanner
methods.

re_idx = (aflex =~ astring)
re_idx = aflex.matchh(astring)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fixnum re_idx;
Flex aflex;
String astring;

Tries to match the whole, full 'astring' against 'aflex'. Returns

-- 0 if 'astring' wasn't long enough (appending the appropriate chars to
'astring' may cause either matching or non-matching)

-- 1 if 'astring' didn't match, and there is no way to append characters
to 'astring' to make it match. This also happens when 'astring' was too
long (e.g. the regexp is /ab?/, 'astring' is "abc").

-- >=2 if one of the regexps in 'aflex' matched the whole 'astring'. The
index (in the constructor of 'aflex') of one of the matching regexps is
returned. If more than one matches, it is undefined which index is
returned (see the flex(1) man page for more details).

Matching starts at the beginning of the string, and ends at the end of the
string. This means: (1) the regexps of 'aflex' are wrapped into implicit
'^...$' chars, (2) an implicit 'aflex.reset' is called.

The number of characters examined is returned in 'aflex.n', which is always
'astring.length'.

Note: this is not a regular regexp match. In fact, it always returns a
Fixnum, which is true in Ruby.

re_idx = aflex.most(astring, [opts=aflex.opts])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fixnum re_idx;
Flex aflex;
String|FalseClass astring;
Fixnum opts;

Append astring to aflex's internal buffer (signal EOF with astring==false).
Try to match the regexps to the internal buffer. Find longest possible
match. At next call, remove previously matched text from the buffer.

[...]

This is the most useful scanning function. It is used for `normal' scanning.
See real-life example in test02.rb. See the tutorial earlier in this
document.

Return value is 0, 1 or >=2, just like in `aflex.matchh'. When nonzero is
returned, `aflex.yytext' can be used to get the string matched. (Note, that
this string can be shorter or longer that astring: shorter when the longest
match is shorter that astring; longer than when a single match was
achieved by calling aflex.most multiple times.)

aflex.each_token(astring) do |re_idx| ... end
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Calls `aflex.most(astring)', and then `aflex.most("")' again and again,
passing matches (>=2) and mis-matchings (1) the the block.

This is the most common iterator construct for tokenizing files and input
streams.

See test02.rb for an example.

re_idx = aflex.go(astring, [opts=aflex.opts])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Fixnum re_idx;
Flex aflex;
String astring;
Fixnum opts;

This function is not really useful. You are probably looking for
`aflex.most' in the wrong place.

Tries to match the beginning (prefix) of 'astring' against 'aflex'. Stops at
the 1st match. (Note: this is almost always _not_ what the user desires.)
Returns:

-- 0 if 'astring' wasn't long enough to match

-- 1 if 'astring' didn't match, and there cannot be a match with more chars

-- >2 if one of the regexps in 'aflex' matched the beginning of 'astring'.
   If not only one is possible, the choice is undefined.

'aflex.go' may be called multiple times provided that[...], but you are not
supposed to do so.

The number of characters examined is returned in 'aflex.n'. This is the sum
of all lengths of subsequent call to 'aflex.go'.

The state can be reset with 'aflex.reset'.

[...]

Old example with aflex.go
^^^^^^^^^^^^^^^^^^^^^^^^^
	require 'flex'
	f=Flex.new( [nil, nil, "al[vm]a", "bar"] );
	# ^^^ we have given 2 regexps with indices 2 and 3
	f.reset # we're starting from the beginning
	p f.go("ba") #: 0 ("ba" is a partial match -- more chars are needed)
	p f.go("r")  #: 3 ("bar" matches regexp 3)
	p f.go("xy") #: 1 ("barxy" is hopeless)
	f.reset
	p f.go("al") #: 0
	g=f.clone
	p f.go("ma") #: 2 ("alma" matches regexp 2)
	p g.go("va") #: 2 ("alva" matches regexp 2)

Good documentation
~~~~~~~~~~~~~~~~~~
I think good software documentation (for tiny (<8Mb), downloadable, free
software, such as those I occasionally write) should have the following
properties:

-- the exact name, version number, stage (stable, beta, alpha etc.), release
   date of the product, license
-- author's e-mail address, download URL
-- other meta-information (such as author's name, mailing lists etc.)
-- a brief, freshmeat-style description of the product (max 400 characters)
-- list of key features (max 2000 characters)
-- list of related other products and competitors
-- list of known bugs and missing features (max 2000 characters), especially
   those which are part of others products, but missing from this one
-- real-life situations where the product is indeed useful
-- dependencies (max 200 characters), with download URL
-- detailed installation instructions as root
-- detailed installation instructions as non-root
-- brief installation instructions for the dependencies
-- specification with all documented classes, methods, attributes,
   parameters, exceptions, constraints, assertions, side effects,
   incompatibilites
-- usage guidelines
-- examples for newbies, tutorial
-- UNIX man pages, if appropriate
-- test cases with expected results (preferably an automatically running test
   suite)
-- revision history (ChangeLog, CHANGES)
-- all documentation must be shipped with the product in electronic form (at
   least) in the English language

(I admit that the documentation of this product does not fulfill my
criterias (yet).)

Meaning of `flex -T -Ce' trace output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is technical, and is almost only to myself.

...
#{start conditions are declared here (`INITIAL' is number 1, the last one is
number s}


1	#{Rule 1}
2	#{Rule 2}
...
n	#{Rule n}
n+1	End Marker

...

DFA Dump:

state #1: #{StartCond 1 begins here at non-BOL}
	...
state #2: #{StartCond 1 begins here at BOL}
	...
state #3: #{StartCond 2 begins here at non-BOL}
	...
state #4: #{StartCond 2 begins here at BOL}
	...
...
state #(2*m-1): #{StartCond 2 begins here at non-BOL}
	...
state #(2*m):   #{StartCond 2 begins here at BOL}
	...
state #(2*m+1): #{`Flex scanner jammed' unreachable state}
	#{no rules here}
state #(2*m+2): #{first usual, normal, non-extra state}
	...
...
state #(2*m+1) accepts: [(n+2)]
state #(2*m+2...) accepts: #{rule_nr}
...

Equivalence Classes:

Revision history
~~~~~~~~~~~~~~~~
version 0.04
^^^^^^^^^^^^
Sat Dec  9 11:08:08 CET 2000:

Electric Fence +gdb didn't help much to find the memory allocation error.
Checkergcc was better.

version 0.05
^^^^^^^^^^^^
Sun Dec  9 02:30:32 CET 2000

version 0.06
^^^^^^^^^^^^
Sun Dec 10 02:53:49 CET 2000

Imp: multiple accepts: [...] is possible
	for -Ce -T, only the longer (!) rule accepts -- the regexp must
	be longer??. But I'm sure that only _1_ rule may accept. Period.
OK: exception handling (massloc_leave)
	Sun Dec 10 02:50:25 CET 2000
OK: memory handling
	Sun Dec  9 02:30:32 CET 2000
OK: auto-free memory after completion!
	Sun Dec  9 02:30:32 CET 2000

version 0.07
^^^^^^^^^^^^
Sun Dec 10 10:55:33 CET 2000

the Flex class with meaningful methods
speed test
Perl is not required any longer
the flex binary is not required any longer
official release
initial fresmeat announcement

version 0.08
^^^^^^^^^^^^
Mon Dec 11 22:04:31 CET 2000--
Thu Dec 14 22:31:17 CET 2000

no compile warnings
added file wrap_ruby
expanded README with `Installing flex.rb as non-root'
greatly increased Printbuf
Printbuf access from ruby
pos field in PtsPrintbuf_t
aflex.most (formerly morse_go)
testsuite with >120 tests at Thu Dec 14 17:41:19 CET 2000
testsuite with >220 tests at Sat Dec 16 16:21:25 CET 2000
flexruby.h moved to lib/flex2.rb
@extra moved to struct fmd (Sat Dec 16 17:03:55 CET 2000)
@t, @c moved to struct fmd (Sat Dec 16 17:08:59 CET 2000)
tutorial about aflex.most (Sun Dec 17 17:26:47 CET 2000)

OK : aflex.most was bad without OPT_REMEMBER (Sat Dec 16 14:54:50 CET 2000)
OK : file `depend' (Sat Dec 16 16:22:19 CET 2000)
OK : report only match for longest_match earlier...
     no-way-to-make-longer optimisation in flex_most()
     (Sat Dec 16 16:22:28 CET 2000)
OK : lib/*.rb (flexruby.h) (Sat Dec 16 16:22:32 CET 2000)
OK: counting matched characters: nchars, yyleng (Sat Dec 16 17:49:18 CET 2000)
OK : aflex.free (Sat Dec 16 18:12:37 CET 2000)
OK : printbuf self-append (Sat Dec 16 17:21:28 CET 2000)
OK : start conditions (Sun Dec 17 17:27:11 CET 2000)

Imp: C parsing test
Imp: line reading test
Imp: HTML parsing test
Imp: better docs
Imp: better warnings
Imp: no assertions in final release
Imp: limit on yyleng
Imp: <<EOF>>, ^, $
Imp: really free unused regexp from memory
Imp: max_len for each_token...
Imp: most() should not return 0 on FALSE

Sat Dec 16 11:19:48 CET 2000

version 0.09
^^^^^^^^^^^^
(bugfix release) at Tue May 15 17:18:39 MET DST 2001
Thu May 17 20:17:10 CEST 2001

OK : statisfied -> satisfied in the README
OK : minor doc fixes
OK : fixes bug01 reported by Tanaka Akira in flexinit.c
OK : fixes bug02 reported by Tanaka Akira in flexinit.c
OK : test case added to test01.rb for bug01

end of README
