
                                EMBOSS: dreg
     _________________________________________________________________
   
                                 Program dreg
                                       
Function

   regular expression search of a nucleotide sequence
   
Description

   This searches for matches of a regular expression to a nucleic acid
   sequence.
   
   A regular expression is a way of specifying an ambiguous pattern to
   search for. Regular expressions are commonly used in some computer
   programming languages and may be more familiar to some users than to
   others.
   
   The following is a short guide to regular expressions in EMBOSS:
   
   ^
          use this at the start of a pattern to insist that the pattern
          can only match at the start of a sequence. (eg. '^AUG' matches
          a start codon at the start of the sequence)
          
   $
          use this at the end of a pattern to insist that the pattern can
          only match at the end of a sequence (eg. 'A+$' matches a poly-A
          sequence at the end of the sequence)
          
   ()
          groups a pattern. This is commonly used with '|' (eg.
          '(AUG)|(ATG)' matches either the DNA or RNA form of the
          initiation codon )
          
   |
          This is the OR operator to enable a match to be made to either
          one pattern OR another. There is no AND operator in this
          version of regular expressions.
          
   The following quantifier characters specify the number of time that
   the character before (in this case 'x') matches:
   
   x?
          matches 0 or 1 times (ie, '' or 'x')
          
   x*
          matches 0 or more times (ie, '' or 'x' or 'xx' or 'xxx', etc)
          
   x+
          matches 1 or more times (ie, 'x' or 'xx' or 'xxx', etc)
          
   Quantifiers can follow any of the following types of character
   specification:
   
   x
          any character (ie 'A')
          
   \x
          the character after the backslash is used instead of its normal
          regular expression meaning. This is commonly used to turn off
          the special meaning of the characters '^$()|?*+[]-.'. It may be
          especially useful when searching for gap characters in a
          sequence (eg '\.' matches only a dot character '.')
          
   [xy]
          match one of the characters 'x' or 'y'. You may have one or
          more characters in this set.
          
   [x-z]
          match any one of the set of characters starting with 'x' and
          ending in 'y' in ASCII order (eg '[A-G]' matches any one of:
          'A', 'B', 'C', 'D', 'E', 'F', 'G')
          
   [^x-z]
          matches anything except any one of the group of characters in
          ASCII order (eg '[^A-G]' matches anything EXCEPT any one of:
          'A', 'B', 'C', 'D', 'E', 'F', 'G')
          
   .
          the dot character matches any other character (eg: 'A.G'
          matches 'AAG', 'AaG', 'AZG', 'A-G' 'A G', etc.)
          
   Combining some of these features gives the example:
'([AGC]+GGG)|(TTTGGG)'

   which matches one or more of any one of 'A' or 'G' or 'C' followed by
   three 'G's or it matches just 'TTTGGG'.
   
   Regular expressions are case-sensitive. The pattern 'AAAA' will not
   match the sequence 'aaaa'.
   
Usage

   Here is a sample session with dreg.

% dreg
Input sequence: embl:paamir
Output file [paamir.dreg]:
Regular expression pattern: ggtacc

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-pattern]           regexp     Regular expression pattern
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-pattern]
   (Parameter 2) Regular expression pattern Any regular epression pattern
   is accepted Required
   [-outfile]
   (Parameter 3) Output file name Output file <sequence>.dreg
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   Any nucleic sequence.
   
Output file format

   This is the output from the example run. Sequence embl:paamir begins
   at a restriction site with the sequence pattern GGTACC.

dreg search of embl:paamir with pattern GGTACC
Matches in PAAMIR
         PAAMIR     1 GGTACC

Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   Regular expressions are case-sensitive. The pattern 'AAAA' will not
   match the sequence 'aaaa'.
   
Diagnostic Error Messages

   None.
   
Exit status

   Always returns 0.
   
Known bugs

   None.
   
See also

   Program name               Description
   fuzznuc      Nucleic acid pattern search
   fuzztran     Protein pattern search after translation
   marscan      Finds MAR/SAR sites in nucleic sequences
   
   Other EMBOSS programs allow you to search for simple patterns and may
   be easier for the user who has never used regular expressions before:
     * fuzznuc - Nucleic acid pattern search
     * fuzzpro - Protein pattern search
     * fuzztran - Protein pattern search after translation
       
Author(s)

   This application was written by Peter Rice (pmr@sanger.ac.uk)
   Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus,
   Hinxton, Cambridge, CB10 1SA, UK.
   
History

   Written (1999) - Peter Rice.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
