
                             EMBOSS: einverted
     _________________________________________________________________
   
                               Program einverted
                                       
Function

   Finds DNA inverted repeats
   
Description

   einverted looks for inverted repeats (stem loops) in a nucleotide
   sequence.
   
   It will find inverted repeats that include a proprtion of mismatches
   and gaps (bulges in the stem loop).
   
   It works by finding alignments between the sequence and its reverse
   complement that exceed a threshold score. Gaps and Mismatches are
   assigned a penalty (negative) score. Matches are assigned a positive
   score. The score is calculated by summing the values of each match,
   the penalties of each mismatch and the large penalties of any gaps.
   Any region whose score exceeds the threshold is reported.
   
   einverted uses dynamic programming and thus is guaranteed to find the
   optimal alignment, but is slower than, for example, a self-by-self
   BLAST. It can find multiple inverted repeats in a sequence.
   
   Secondary structures like inverted repeats in genomic sequences may be
   implicated in initiation of DNA replication.
   
Usage

   Here is a sample session with einverted.

% einverted
Input sequence: embl:hsts1
Output file [hsts1.inv]:
Gap penalty [12]:
Minimum score threshold [50]:
Match score [3]:
Mismatch score [-4]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
   -gap                integer    Gap penalty
   -threshold          integer    Minimum score threshold
   -match              integer    Match score
   -mismatch           integer    Mismatch score
  [-outfile]           outfile    Output file name

   Optional qualifiers:
   -maxrepeat          integer    Maximum separation between the start of
                                  repeat and the end of the inverted repeat
                                  (the default is 4000 bases).

   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence USA Readable sequence Required
   -gap Gap penalty Any integer value 12
   -threshold Minimum score threshold Any integer value 50
   -match Match score Any integer value 3
   -mismatch Mismatch score Any integer value -4
   [-outfile]
   (Parameter 2) Output file name Output file <sequence>.einverted
   Optional qualifiers Allowed values Default
   -maxrepeat Maximum separation between the start of repeat and the end
   of the inverted repeat (the default is 4000 bases). Any integer value
   4000
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   The input for einverted is a nucleotide sequence
   
Output file format

   Here is the output form the example run. The first two hits are
   removed to avoid the output lines being too wide.

......................

Score 80: 44/51 ( 86%) matches, 2 gaps
   12246 ctcctgcctcag-cctccaagtagctgggattaca-gcatgtgccaccatgcc 12296
         |||||| ||||| | |||||   |||||||||||| ||||| |||||||| ||
   13938 gaggacagagtcagaaggtttcacgaccctaatgtccgtactcggtggtatgg 13886

Score 99: 53/65 ( 81%) matches, 1 gaps
   13884 tgggtatggtggctcatgcctgtaatcccagcactttggaagactgagacaggagcaattgcttga 139
49
         ||||| |||||||   ||||||||||||||||    ||| || ||||| ||| || ||||||||||
   14692 acccacaccaccgtacacggacattagggtcgatggaccctccgactccgtcttc-ttaacgaact 146
28

Data files

   None.
   
Notes

   Sometimes you can find repeats using the program palindrome that you
   can't find with einverted using the default parameters.
   
   This is not due to a problem with either program. It is simply because
   some of the shortest repeats that you find with palindrome's default
   parameter values are below einverted's default cutoff score - you
   should decrease the 'Minimum score threshold' to see them.
   
   For example, when palindrome is run with 'em:hsfau1', it finds the
   repeat:
   
64    aaaactaaggc    74
      |||||||||||
98    ttttgattccg    88

   einverted will not report this as its score is 33 (11 bases scoring 3
   each, no mismatches or gaps) with is below the default score cutoff of
   50.
   
   If einverted is run as:
   
   % einverted em:hsfau1 -threshold 33
   
   then it will find it:
   
Score 33: 11/11 (100%) matches, 0 gaps
      64 aaaactaaggc 74
         |||||||||||
      98 ttttgattccg 88

   (Anything can be considered to be a repeat if you set the score
   threshold low enough!)
   
References

   Some assorted references on inverted repeats:
   
    1. Pearson CE, Zorbas H, Price GB, Zannis-Hadjopoulos M Inverted
       repeats, stem-loops, and cruciforms: significance for initiation
       of DNA replication. J Cell Biochem 1996 Oct;63(1):1-22
    2. Waldman AS, Tran H, Goldsmith EC, Resnick MA. q Long inverted
       repeats are an at-risk motif for recombination in mammalian cells.
       Genetics. 1999 Dec;153(4):1873-83. PMID: 10581292; UI: 20050682
    3. Jacobsen SE Gene silencing: Maintaining methylation patterns. Curr
       Biol 1999 Aug 26;9(16):R617-9
    4. Lewis S, Akgun E, Jasin M. Palindromic DNA and genome stability.
       Further studies. Ann N Y Acad Sci. 1999 May 18;870:45-57. PMID:
       10415472; UI: 99343961
    5. Dai X, Greizerstein MB, Nadas-Chinni K, Rothman-Denes LB
       Supercoil-induced extrusion of a regulatory DNA hairpin. Proc Natl
       Acad Sci U S A 1997 Mar 18;94(6):2174-9
       
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with a status of 0.
   
Known bugs

   None.
   
See also

   Program name                     Description
   equicktandem Finds tandem repeats
   etandem      Looks for tandem repeats in a nucleotide sequence
   palindrome   Looks for inverted repeats in a nucleotide sequence
   
   palindrome also looks for inverted repeats but is much faster and less
   sensitive, as it looks for near-perfect repeats.
   
Author(s)

   This program was originally written by Richard Durbin at the Sanger
   Centre.
   
   This application was modified for inclusion in EMBOSS by Peter Rice
   (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome
   Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
   
History

   Written (1999) - Peter Rice
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
