
                             EMBOSS: wordmatch
     _________________________________________________________________
   
                               Program wordmatch
                                       
Function

   Finds all exact matches of a given size between 2 sequences
   
Description

   Finds all exact matches of a given minimum size between 2 sequences
   displaying the start points in each sequence and the match length.
   
   This program takes two sequences and finds regions where they are
   identical. These regions are reported in the output file (and
   optionally) in GFF (Gene Feature Format) files.
   
   It will not find identical regions smaller than the specified
   wordsize.
   
Usage

   Here is a sample session with wordmatch.

% wordmatch sw:hba_human sw:hbb_human
Output file [hba_human.wordmatch]:
Word size [4]:

Command line arguments

   Mandatory qualifiers:
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
   -wordsize           integer    Word size
  [-outfile]           align      (no help text) align value

   Optional qualifiers: (none)
   Advanced qualifiers:
   -afeatout           featout    File for output of normal tab delimited GFF
                                  features
   -bfeatout           featout    File for output of normal tab delimited GFF
                                  features

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-asequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-bsequence]
   (Parameter 2) Sequence USA Readable sequence Required
   -wordsize Word size Integer 2 or more 4
   [-outfile]
   (Parameter 3) (no help text) align value Alignment file
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -afeatout File for output of normal tab delimited GFF features
   Writeable feature table unknown.gff
   -bfeatout File for output of normal tab delimited GFF features
   Writeable feature table unknown.gff
   
Input file format

   Any two sequence USAs of the same type (DNA or protein).
   
Output file format

   The file produced in the above example is:
     _________________________________________________________________
   
FINALLY length = 3
 HBA_HUMAN  HBB_HUMAN Length
        58          63          5
        14          15          4
       116         121          4
     _________________________________________________________________
   
   The first line ('FINALLY...') gives the number of regions found.
   
   The next line gives the headers for the subsequent columns of data.
   This consists for the names of the two sequence and the word 'Length'.
   
   Subsequent lines consist of three columns fo numbers separated by
   spaces or TAB characters. Each line contains the information on one
   identical region. The first column is the start position in the first
   sequence of the identical region. The second number is the start
   position in the second sequence. the third number is the length of the
   identical region.
   
   If no regions are found, the output file is blank.
   
Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   0 if successful.
   
Known bugs

   None.
   
See also

   Program name Description
   matcher Finds the best local alignments between two sequences
   seqmatchall Does an all-against-all comparison of a set of sequences
   supermatcher Finds a match of a large sequence against one or more
   sequences
   water Smith-Waterman local alignment
   
Author(s)

   This application was written by Ian Longden (il@sanger.ac.uk)
   Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus,
   Hinxton, Cambridge, CB10 1SA, UK.
   
History

   Completed 27th November 1998.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
