
                            EMBOSS: supermatcher
     _________________________________________________________________
   
                             Program supermatcher
                                       
Function

   Finds a match of a large sequence against one or more sequences
   
Description

   This is a rough and ready local alignment program for large sequences.
   The reason it is rough and ready is that wordmatch is used to find all
   the wordmatches between the first sequence and another sequence. Then
   by calculating the highest score for a diagonal we can then use this
   as the centre point for a Smith-Waterman type calculation of a width
   given by the user. So a narrow diagonal smith-waterman is calculated
   hence the results will be rough but due to the space saving much
   larger sequences can be aligned.
   
Usage

   Here is a sample session with supermatcher.

 supermatcher ~/wordtest/U68037 ~/wordtest/AB003171 -noscoreonly

Finds a match of a large sequence against one or more sequences
Gap opening penalty [10.0]:
Gap extension penalty [0.5]: 3.0
Output file [stdout]:
Local: RNU68037 vs EM:AB003171
Score: 30.00

RNU68037        820   tcaaccacagctgccctccgcagctctcggggag.gcggc.tccg 862
                      ||||| ||||   || |  ||||     |||||| ||  | |||
EM:AB003171     2492  tcaactacag.aaccatgtgcag....aggggagagctccatcct 2531

RNU68037        863   cgcgcagggttcacgcacacga.cgtgg.aaatggtgggccagct 905
                       |  ||  ||| | || || || | ||| ||   ||||   |  |
EM:AB003171     2532  tgaaca..gttaaagc.ca.gagcttggtaacaagtggataaatt 2572

RNU68037        906   .cgtgggcatcatggtggtgtc.gtg..catctgctggagc     942
                       | |    |||||  ||  ||| | |  ||  |||  ||||
EM:AB003171     2573  acat....atcattttgcggtctgagaacacatgc.agagc      2608


Command line arguments

   Mandatory qualifiers:
  [-seqa]              seqall     Sequence database USA
  [-seqb]              seqset     Sequence set USA
   -gapopen            float      Gap opening penalty
   -gapextend          float      Gap extension penalty
   -outfile            align      (no help text) align value

   Optional qualifiers:
   -datafile           matrixf    Matrix file
   -width              integer    Alignment width
   -wordlen            integer    word length for initial matching
   -errorfile          outfile    Error file to be written to

   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-seqa]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-seqb]
   (Parameter 2) Sequence set USA Readable sequences Required
   -gapopen Gap opening penalty Number from 1.000 to 100.000 10.0 for any
   sequence type
   -gapextend Gap extension penalty Number from 0.100 to 10.000 0.5 for
   any sequence type
   -outfile (no help text) align value Alignment file
   Optional qualifiers Allowed values Default
   -datafile Matrix file Comparison matrix file in EMBOSS data path
   EBLOSUM62 for protein
   EDNAFULL for DNA
   -width Alignment width Any integer value 16
   -wordlen word length for initial matching Integer 3 or more 6
   -errorfile Error file to be written to Output file supermatcher.error
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   Two sequence USAs.
   
Output file format

   supermatcher.error will contain any errors that occured during the
   program. This maybe that wordmatch could not find any matches hence no
   suitable start point is found for the smith-waterman calculation.
   
Data files

   For protein sequences EBLOSUM62 is used for the substitution matrix.
   For nucleotide sequence, EDNAMAT is used. Others can be specified.
   
Notes

   The time this program takes to do an alignment depends very much on
   the word size. For short sequences a short word size (e.g. 4) can make
   it take a very long time. Large word sizes (e.g. 30) for sequences
   that are very similar give a very quick result. The default of 16
   should give reasonable fast alignments.
   
   Because it does a Smith & Waterman alignment (albeit in a narrow
   region around the diagonal shown to be the 'best' by a word match),
   this program can use huge amounts of memory if the sequences are
   large.
   
   Because the alignment is made within a narrow area each side of the
   'best' diagonal, if there are sufficient indels between the two
   sequences, then the path of the Smith & Waterman alignment can wander
   outside of this area. Making the width larger can avoid this problem,
   but you then use more memory.
   
   The longer the sequences and the wider the specified alignment width,
   the more memory will be used.
   
   If the program terminates due to lack of memory you can try the
   following:
   
   Run the UNIX command 'limit' to see if your stack or memory usage have
   been limited and if so, run 'unlimit', (e.g.: '% unlimit stacksize').
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with a status of 0.
   
Known bugs

   None.
   
See also

   Program name                         Description
   matcher      Finds the best local alignments between two sequences
   seqmatchall  Does an all-against-all comparison of a set of sequences
   water        Smith-Waterman local alignment
   wordmatch    Finds all exact matches of a given size between 2 sequences
   
Author(s)

   This application was written by Ian Longden (il@sanger.ac.uk)
   Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus,
   Hinxton, Cambridge, CB10 1SA, UK.
   
History

   Finished.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
