
                              EMBOSS: etandem
     _________________________________________________________________
   
                                Program etandem
                                       
Function

   Looks for tandem repeats in a nucleotide sequence
   
Description

   etandem looks for tandem repeats in a sequence. It is normally used
   after equicktandem has been run to identify potential repeat sizes. It
   calculates a consensus for the repeat region and gives a score for how
   many matches there are to the consensus - the number of mismatches.
   
   Input sequences are converted into ACGT or N (so ambiguity codes are
   ignored).
   The score is +1 for a match, -1 for a mismatch.
   The first copy of a repeat is ignored.
   The highest score is kept for each start position and repeat size.
   
   The lowest score to be reported is set by the threshold score. The
   threshold score can be set on the command-line using the -threshold
   qualifier, the default is 20. For perfect repeats, the score is the
   length of the repeat (except for the first copy). Reduce the threshold
   score a little if you wish to to allow mismatches. Each mismatch
   scores -1 instead of +1 so it scores 2 less than a perfect match of
   the same number of bases.
   
   Running with a wide range of repeat sizes is inefficient. That is why
   equicktandem was written - to give a rapid estimate of the major
   repeat sizes.
   
Usage

   Here is a sample session with etandem. The input sequence is the human
   herpesvirus tandem repeat.
   
% etandem
Input sequence: embl:hhtetra
Output file [hhtetra.tan]:
Minimum repeat size [10]: 6
Maximum repeat size [6]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
   -minrepeat          integer    Minimum repeat size
   -maxrepeat          integer    Maximum repeat size
  [-outfile]           report     (no help text) report value

   Optional qualifiers: (none)
   Advanced qualifiers:
   -threshold          integer    Threshold score
   -mismatch           bool       Allow N as a mismatch
   -uniform            bool       Allow uniform consensus
   -origfile           outfile    Output file name

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence USA Readable sequence Required
   -minrepeat Minimum repeat size Integer, 2 or higher 10
   -maxrepeat Maximum repeat size Integer, same as -minrepeat or higher
   Same as -minrepeat
   [-outfile]
   (Parameter 2) (no help text) report value Report file
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -threshold Threshold score Any integer value 20
   -mismatch Allow N as a mismatch Yes/No No
   -uniform Allow uniform consensus Yes/No No
   -origfile Output file name Output file <sequence>.etandem
   
Input file format

   The input for etandem is a nucleotide sequence.
   
Output file format

   The output is a standard EMBOSS report file.
   
   The results can be output in one of several styles by using the
   command-line qualifier -rformat xxx, where 'xxx' is replaced by the
   name of the required format. The available format names are: embl,
   genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel,
   feattable, motif, regions, seqtable, simple, srs, table, tagseq
   
   See:
   http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for
   further information on report formats.
   
   By default etandem writes a 'table' report file.
   
   The output from the above example is:
     _________________________________________________________________
   
########################################
# Program: etandem
# Rundate: Thu Apr 11 13:31:10 2002
# Report_file: stdout
########################################

#=======================================
#
# Sequence: HHTETRA     from: 1   to: 1272
# HitCount: 5
#
# Threshold: 20
# Minrepeat: 6
# Maxrepeat: 6
# Mismatch: No
# Uniform: No
#
#=======================================

  Start     End   Score   Size  Count Identity Consensus
    793     936     120      6     24     93.8 acccta
    283     420      90      6     23     84.8 taaccc
    432     485      38      6      9     90.7 ccctaa
    494     529      26      6      6     94.4 ccctaa
    568     597      24      6      5    100.0 aaccct

#---------------------------------------
#---------------------------------------
     _________________________________________________________________
   
Data files

Notes

   Running with a wide range of repeat sizes is inefficient. That is why
   equicktandem was written - to give a rapid estimate of the major
   repeat sizes.
   
References

   None.
   
Warnings

   None.
   
Diagnostics

   None.
   
Exit status

   None.
   
Known bugs

   None.
   
See also

   Program name                     Description
   einverted    Finds DNA inverted repeats
   equicktandem Finds tandem repeats
   palindrome   Looks for inverted repeats in a nucleotide sequence
   
   Running with a wide range of repeat sizes is inefficient. That is why
   equicktandem was written - to give a rapid estimate of the major
   repeat sizes.
   
Authors

   This program was originally written by Richard Durbin at the Sanger
   Centre.
   
   This application was modified for inclusion in EMBOSS by Peter Rice
   (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome
   Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
   
Priority

   Completed 25 May 1999
   
Target

   etandem is aimed at automated repeat identification in genomic
   sequnece but can also be used by general users.
   
Comments
