
                              EMBOSS: restrict
     _________________________________________________________________
   
                               Program restrict
                                       
Function

   Finds restriction enzyme cleavage sites
   
Description

   Restrict uses the REBASE database of restriction enzymes to predict
   cut sites in a DNA sequence. The program allows you to select a range
   of cuts, whether the DNA is circular, whether IUB ambiguity codes are
   used, whether blunt or sticky ends or both are reported. You may also
   force the reporting of single cleavage sites.
   
Usage

   Here is a sample session with restrict.

% restrict
Finds restriction enzyme cleavage sites
Input sequence(s): embl:hsfau
Minimum recognition site length [4]:
Comma separated enzyme list [all]:
Output file [hsfau.restrict]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
   -sitelen            integer    Minimum recognition site length
   -enzymes            string     The name 'all' reads in all enzyme names
                                  from the REBASE database. You can specify
                                  enzymes by giving their names with commas
                                  between then, such as:
                                  'HincII,hinfI,ppiI,hindiii'.
                                  The case of the names is not important. You
                                  can specify a file of enzyme names to read
                                  in by giving the name of the file holding
                                  the enzyme names with a '@' character in
                                  front of it, for example, '@enz.list'.
                                  Blank lines and lines starting with a hash
                                  character or '!' are ignored and all other
                                  lines are concatenated together with a comma
                                  character ',' and then treated as the list
                                  of enzymes to search for.
                                  An example of a file of enzyme names is:
                                  ! my enzymes
                                  HincII, ppiII
                                  ! other enzymes
                                  hindiii
                                  HinfI
                                  PpiI
  [-outfile]           report     (no help text) report value

   Optional qualifiers: (none)
   Advanced qualifiers:
   -min                integer    Minimum cuts per RE
   -max                integer    Maximum cuts per RE
   -single             bool       Force single site only cuts
   -[no]blunt          bool       Allow blunt end cutters
   -[no]sticky         bool       Allow sticky end cutters
   -[no]ambiguity      bool       Allow ambiguous matches
   -plasmid            bool       Allow circular DNA
   -[no]commercial     bool       Only enzymes with suppliers
   -datafile           string     Alternative RE data file
   -[no]limit          bool       Limits reports to one isoschizomer
   -preferred          bool       Report preferred isoschizomers
   -alphabetic         bool       Sort output alphabetically
   -fragments          bool       Show fragment lengths
   -name               bool       Show sequence name

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   -sitelen Minimum recognition site length Integer from 2 to 20 4
   -enzymes The name 'all' reads in all enzyme names from the REBASE
   database. You can specify enzymes by giving their names with commas
   between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the
   names is not important. You can specify a file of enzyme names to read
   in by giving the name of the file holding the enzyme names with a '@'
   character in front of it, for example, '@enz.list'. Blank lines and
   lines starting with a hash character or '!' are ignored and all other
   lines are concatenated together with a comma character ',' and then
   treated as the list of enzymes to search for. An example of a file of
   enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii
   HinfI PpiI Any string is accepted all
   [-outfile]
   (Parameter 2) (no help text) report value Report file
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -min Minimum cuts per RE Integer from 1 to 1000 1
   -max Maximum cuts per RE Integer up to 2000000000 2000000000
   -single Force single site only cuts Yes/No No
   -[no]blunt Allow blunt end cutters Yes/No Yes
   -[no]sticky Allow sticky end cutters Yes/No Yes
   -[no]ambiguity Allow ambiguous matches Yes/No Yes
   -plasmid Allow circular DNA Yes/No No
   -[no]commercial Only enzymes with suppliers Yes/No Yes
   -datafile Alternative RE data file Any string is accepted An empty
   string is accepted
   -[no]limit Limits reports to one isoschizomer Yes/No Yes
   -preferred Report preferred isoschizomers Yes/No No
   -alphabetic Sort output alphabetically Yes/No No
   -fragments Show fragment lengths Yes/No No
   -name Show sequence name Yes/No No
   
Input file format

   The input sequence can be one or more DNA sequences.
   
Output file format

   The output is a standard EMBOSS report file.
   
   The results can be output in one of several styles by using the
   command-line qualifier -rformat xxx, where 'xxx' is replaced by the
   name of the required format. The available format names are: embl,
   genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel,
   feattable, motif, regions, seqtable, simple, srs, table, tagseq
   
   See:
   http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for
   further information on report formats.
   
   By default restrict writes a 'table' report file.
   
   The output from restrict is a simple text one. The base number,
   restriction enzyme name, recognition site and cut positions are shown.
   Note that cuts are always to the right of the residue shown and that
   5' cuts are referred to by their associated 3' number sequence.
   
   The program reports enzymes that cut at two or four sites. The program
   also reports isoschizomers and enzymes having the same recognition
   sequence but different cut sites.
   
   Here is part of a sample output:
  __________________________________________________________________________

########################################
# Program: restrict
# Rundate: Mon Feb 11 13:46:44 2002
# Report_file: hsfau.restrict
########################################

#=======================================
#
# Sequence: HSFAU     from: 1   to: 518
# HitCount: 214
#
# Minimum cuts per enzyme: 1
# Maximum cuts per enzyme: 2000000000
# Minimum length of recognition site: 4
# Blunt ends allowed
# Sticky ends allowed
# DNA is linear
# Ambiguities allowed
#
#
#=======================================

USA               Start     End   Score Enzyme_name Restriction_site 5prime 3pr
ime 5primerev 3primerev
HSFAU                 3       6   0.000 MnlI        CCTC             13     12
  .         .
HSFAU                 9      14   0.000 Hpy188III   TCNNGA           10     12
  .         .
HSFAU                11      14   0.000 TaqI        TCGA             11     13
  .         .
HSFAU                13      17   0.000 HinfI       GANTC            13     16
  .         .
HSFAU                17      21   0.000 MlyI        GAGTC            7      7
 .         .
HSFAU                17      21   0.000 PleI        GAGTC            7      8
 .         .
HSFAU                24      27   0.000 AccII       CGCG             25     25
  .         .
HSFAU                24      28   0.000 MboII       GAAGA            12     11
  .         .
HSFAU                28      31   0.000 AciI        CCGC             25     27
  .         .

etc.

HSFAU               437     443   0.000 DraII       RGGNCCY          438    441
   .         .
HSFAU               438     442   0.000 AspS9I      GGNCC            438    441
   .         .
HSFAU               438     443   0.000 BscBI       GGNNCC           440    440
   .         .
HSFAU               438     441   0.000 BshFI       GGCC             439    439
   .         .
HSFAU               438     441   0.000 CviJI       RGCY             439    439
   .         .
HSFAU               454     459   0.000 AflII       CTTAAG           454    458
   .         .
HSFAU               454     459   0.000 SmlI        CTYRAG           454    458
   .         .
HSFAU               455     458   0.000 MseI        TTAA             455    457
   .         .
HSFAU               468     471   0.000 Sse9I       AATT             467    471
   .         .
HSFAU               474     477   0.000 CviJI       RGCY             475    475
   .         .
HSFAU               492     495   0.000 CviJI       RGCY             493    493
   .         .
HSFAU               497     501   0.000 BstDEI      CTNAG            497    500
   .         .

#---------------------------------------
#---------------------------------------
  __________________________________________________________________________

Data files

   The data files are stored in the REBASE directory of the standard
   EMBOSS data directory. The names are:
     * embossre.enz Cleavage information
     * embossre.ref Reference/methylation information
     * embossre.sup Supplier information
       
   The column information is described at the top of the data files
   
Notes

   Output file size is related to the size of the recognition site and
   the maximum number of allowed cutting positions. Setting the site
   length to six and restricting the cuts to two is a common choice of
   parameters. The size of the output can sometimes be reduced by
   specifying the -noambiguity switch.
   
   The data files must have been created before running this program.
   This is done by running the rebaseextract program with the "withrefm"
   file from an REBASE release. You may have to ask your system manager
   to do this.
   
References

    1. Nucleic Acids Research 27: 312-313 (1999).
       
Warnings

   The program will warn you if a protein sequence is given.
   
Diagnostic Error Messages

Exit status

   It exits with status 0 unless an error is reported.
   
Known bugs

See also

   Program name                          Description
   recoder      Remove restriction sites but maintain the same translation
   redata       Search REBASE for enzyme name, references, suppliers etc
   remap        Display a sequence with restriction cut sites, translation etc
   restover     Finds restriction enzymes that produce a specific overhang
   showseq      Display a sequence with features, translation etc
   silent       Silent mutation restriction enzyme scan
   
Author(s)

   This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
   
History

   Completed 16th April 1999
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
