
                              EMBOSS: degapseq
     _________________________________________________________________
   
                               Program degapseq
                                       
Function

   Removes gap characters from sequences
   
Description

   degapseq reads in one or more sequences and writes them out again
   minus any gap characters. In effect it removes gaps from aligned
   sequences.
   
   In fact, if does more than just this as it removes ANY non-alphabetic
   character from the input sequence, so as well as removing the
   gap-characters, it will remove such things as the '*' in protein
   sequenecs that indicates the position of a 'translated' STOP codon.
   
   There are many different formats for storing sequences in files. Some
   sequence formats allow you to store aligned sequences, including the
   information on where gaps have been introduced to make the sequence
   align properly. This is indicated by using a special character to
   indicate that there is a gap at that position. Different sequence
   formats use different characters to indicate gaps. Some formats may
   use more than one type of character to indicate different types of
   gaps (e.g. gaps at the ends of the sequences, internal gaps, gaps
   introduced by a program or by a person editing the alignment, etc.)
   Some typicate characters used to indicate where gaps are may be: '.',
   '-' and '~'.
   
   When EMBOSS programs read in a sequence that has gap-characters in,
   all gap characters are internally changed to '-' characters. i.e.
   EMBOSS only has one type of gap character. Thus any distinguishing
   characters for different gap types are reduced to a '-'. There is only
   one type of gap in EMBOSS.
   
   degapseq removes any non-alphabetic character in the sequence, in
   effect this means that gaps and '*' characters are removed. The
   sequence is then written out.
   
Usage

   Here is a sample session with degapseq:

% degapseq alignment.seq nogaps.seq

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-outseq]            seqoutall  Output sequence(s) USA

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-outseq]
   (Parameter 2) Output sequence(s) USA Writeable sequence(s)
   <sequence>.format
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   Any valid input sequence USA is allowed.
   
   The input sequence can be nucleic or protein.
   
   The input sequence can be gapped or ungapped.
   
   An example of a sequence with gaps might be:
   
>dgshsh
ATGCGCAGGTACGTATG....CTGACGGTACGTGATCGA-GCTGA-CGAGCGTATGC-----
>hsf1
--------TGACTGATGCTGA~~~~CTG-ACGTGACTGATGCTGATCGTGACTGATCGTGAC
>myclone1
ATGCGCAGGTACGTATGCTGACGGTACGTGATCGA-GCTGA-CGAGCGTATGC-----

Output file format

   The output is a sequence with no gaps.
   
   An example is the ouput of the above input sequence:
   
>dgshsh
ATGCGCAGGTACGTATGCTGACGGTACGTGATCGAGCTGACGAGCGTATGC
>hsf1
TGACTGATGCTGACTGACGTGACTGATGCTGATCGTGACTGATCGTGAC
>myclone1
ATGCGCAGGTACGTATGCTGACGGTACGTGATCGAGCTGACGAGCGTATGC

Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   It will remove '*' characters from protein sequences as well as
   removing the gap characters.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0.
   
Known bugs

   None.
   
See also

   Program name                          Description
   biosed       Replace or delete sequence sections
   cutseq       Removes a specified section from a sequence
   descseq      Alter the name or description of a sequence
   entret       Reads and writes (returns) flatfile entries
   extractfeat  Extract features from a sequence
   extractseq   Extract regions from a sequence
   listor       Writes a list file of the logical OR of two sets of sequences
   maskfeat     Mask off features of a sequence
   maskseq      Mask off regions of a sequence
   newseq       Type in a short new sequence
   noreturn     Removes carriage return from ASCII files
   notseq       Excludes a set of sequences and writes out the remaining ones
   nthseq       Writes one sequence from a multiple set of sequences
   pasteseq     Insert one sequence into another
   revseq       Reverse and complement a sequence
   seqret       Reads and writes (returns) sequences
   seqretsplit  Reads and writes (returns) sequences in individual files
   splitter     Split a sequence into (overlapping) smaller sequences
   swissparse   Retrieves sequences from swissprot using keyword search
   trimest      Trim poly-A tails off EST sequences
   trimseq      Trim ambiguous bits off the ends of sequences
   union        Reads sequence fragments and builds one sequence
   vectorstrip  Strips out DNA between a pair of vector sequences
   yank         Reads a sequence range, appends the full USA to a list file
   
Author(s)

   This application was written by Gary Williams
   (gwilliam@hgmp.mrc.ac.uk)
   
History

   Written (6 March 2001) - Gary Williams
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
