Adding Ispell support to UdmSearch
==================================


Version 3.0 UdmSearch can store ispell files both in
SQL database like in 2.x versions and can load ispell files from the disc.
Currently only search frontends (both CGI and PHP) can use 
ispell stored in SQL database.

When UdmSearch is used with ispell support all words are normalized by
both indexer and search frontend. It allows to find the same words with different
endings. For example, if the words "testing" or "tests" are found in
the document, the word "test" will be stored by indexer instead. Search frontend
will also try to find the word "test" if "testing" or "tests" is given in
search query. Note that this schema loose exact search possibility, but
usually reduces the size of database and makes search faster. Only suffixes
are supported by now. Prefixes are usually change the word meanings, for 
example if somebody search for the word "tested" he hardly wants "untested" 
to be found.

To make UdmSearch support ispell you must specify Affix and Spell commands 
in both  indexer.conf and search.htm files. Note, that you can store ispell 
data in SQL database using 

# indexer -L lang -A affix.file to load affixes and 
# indexer -L lang -D dict.file to load dictionary.

Search.cgi and PHP frontend can be switched to use SQL to normalize words 
by specifying IspellMode db in search.htm. In this case Affix and Spell 
commands are nor nesessary.

Note that search time ispell support is  not implemented in frontend yet 
and works in search.cgi and PHP frontend only.

Note that ispell commands MUST be given after LocalCharset definition
in both search.htm and indexer.conf in UdmSearch versions before 3.0.15


The format of commands:
Affix <lang> <ispell affixes file name>
Spell <lang> <ispell dictionary filename>

The first parameter of both commands is two letters language abbrevation.
The second one is filename. File name are relative to UdmSearch /etc
directory. Absolute paths can be also specified.

Note that loading of several languages is supported at the same time.

For example,

Affix en en.aff
Spell en en.dict
Addix de de.aff
Spell de de.dict

will load ispell support for both English anf German languages.

Ispell affixes file contains rules for words and has the following format:

flag V:
    E           >       -E,IVE          # As in create > creative
    [^E]        >       IVE             # As in prevent > preventive

flag *N:
    E           >       -E,ION          # As in create > creation
    Y           >       -Y,ICATION      # As in multiply > multiplication
    [^EY]       >       EN              # As in fall > fallen


Ispell dicitonary file contains words themselfs and has format like this:

wop/S
word/DGJMS
wordage/S
wordbook
wordily
wordless/P

Note that if you add ispell support to already existing database, reindexing
is required. In other case non-normalized words will not be found at all.


Checking site against correct spelling
======================================

You may change the factors of word weight depending on whether word 
is found in Ispell dictionaries or not. There ars two indexer.conf 
commands are available (with default value 1):

IspellCorrectFactor	1
IspellIncorrectFactor	1

Setting the  "IspellCorrectFactor" to 0 will prevent indexer from storing 
words with correct spelling in database. The only incorrect words will be 
stored in database in this case. Then you may easily find incorrect words
and correspondent URLs where those words are found. If no ispell files are
used all word are considered as "incorrect".

There is possible that several rare word will be found in your site
which are not in ispell dictionaries. You may create the list of such
words in plain text file of this format (on word per line):

rare.dict:
----------
webmaster
intranet
.......
www
http
---------

You may also use ispell flags in this file if you know how to :-)
This will allow not to write the same word with different endings to the
rare words file, for example "webmaster" and "webmasters". You may choose
the word which have the same changing rules from existing ispell dictionary
and just to copy flags from it. For example, English dictionary has this line:

postmaster/MS
  
So, webmaster with MS flags will be probably OK:

webmaster/MS


Then copy this file to /etc directory of UdmSearch and add 
this file by Spell command, for example:

Spell en rare.dict

During next reindexing new words will be considered as words with 
correct spelling. The only really incorrect words will remain.
