
                               EMBOSS: scope
     _________________________________________________________________
   
                                 Program scope
                                       
Function

   Convert raw scop classification file to embl-like format
   
Description

   Nearly all proteins have structural similarities with other proteins
   and, in some of these cases, share a common evolutionary origin. A
   knowledge of these relationships is crucial to our understanding of
   the evolution of proteins and of development. It will also play an
   important role in the analysis of the sequence data that is being
   produced by worldwide genome projects.
   
   The SCOP database aims to provide a detailed and comprehensive
   description of the structural and evolutionary relationships between
   all proteins whose structure is known, including all entries in the
   Protein Data Bank (PDB).
   
   scope reads the SCOP classification file available at
   http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?dir=lin
   
   scope writes the SCOP classification to an EMBL-like format file.
   
   No changes are made to the data other than changing the format in
   which it is held.
   
   This EMBL-like format SCOP file is used by several other EMBOSS
   programs.
   
   The reason why the SCOP database format is changed to an EMBL-like
   format before being used used by other EMBOSS programs is that it is
   an easier format to work with than the native SCOP database format.
   
Usage

   Here is a sample session with scope:
   

% scope
Convert raw scop classification file to embl-like format
Name of scop file for input (raw format) [scop.orig]: /data/scop/scop.orig
Name of scop file for output (embl-like format) [Escop.dat]: Escop.test

Command line arguments

   Mandatory qualifiers:
  [-infile]            infile     Name of scop file for input (raw format)
  [-outfile]           outfile    Name of scop file for output (embl-like
                                  format)

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-infile]
   (Parameter 1) Name of scop file for input (raw format) Input file
   scop.orig
   [-outfile]
   (Parameter 2) Name of scop file for output (embl-like format) Output
   file Escop.dat
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   The native format SCOP database input file is available at
   http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?dir=lin
   
   The format of this file is explained at
   http://scop.mrc-lmb.cam.ac.uk/scop/parse/index.html
   
   The file given at this URL contains a single line for each domain in
   SCOP, including text describing the position of the domain in the SCOP
   hierarchy. Note that other SCOP classification files, without this
   annotation, are available at
   http://scop.mrc-lmb.cam.ac.uk/scop/parse/index.html
   
Output file format

   The output records used to describe an entry are given below. Records
   (4) to (8) are used to describe the position of the domain in the scop
   hierarchy.
   
    1. ID - Domain identifier code. This is a 7-character code that
       uniquely identifies the domain in scop. It is identical to the
       first 7 characters of a line in the scop classification file. The
       first character is always 'D', the next four characters are the
       PDB identifier code, the fifth character is the PDB chain
       identifier to which the domain belongs (a '.' is given in cases
       where the domain is composed of multiple chains, a '_' is given
       where a chain identifier was not specified in the PDB file) and
       the final character is the number of the domain in the chain (for
       chains comprising more than one domain) or '_' (the chain
       comprises a single domain only).
    2. EN - PDB identifier code. This is the 4-character PDB identifier
       code of the PDB entry containing the domain.
    3. OS - Source of the protein. It is identical to the text given
       after 'Species' in the scop classification file.
    4. CL - Domain class. It is identical to the text given after 'Class'
       in the scop classification file.
    5. FO - Domain fold. It is identical to the text given after 'Fold'
       in the scop classification file.
    6. SF - Domain superfamily. It is identical to the text given after
       'Superfamily' in the scop classification file.
    7. FA - Domain family. It is identical to the text given after
       'Family' in the scop classification file.
    8. DO - Domain name. It is identical to the text given after
       'Protein' in the scop classification file.
    9. NC - Number of chains comprising the domain (usually 1). If the
       number of chains is greater than 1, then the domain entry will
       have a section containing a CN and a CH record (see below) for
       each chain.
   10. CN - Chain number. The number given in brackets after this record
       indicates the start of the data for the relevent chain.
   11. CH - Domain definition. The character given before CHAIN is the
       PDB chain identifier (a '.' is given in cases where a chain
       identifier was not specified in the scop classification file), the
       strings before START and END give the start and end positions
       respectively of the domain in the PDB file (a '.' is given in
       cases where a position was not specified). Note that the start and
       end positions refer to residue numbering given in the original pdb
       file and therefore must be treated as strings.
       
   An example of an excerpt from an output file follows:
     _________________________________________________________________
   
ID   D3SDHA_
XX
EN   3SDH
XX
OS   Ark clam (Scapharca inaequivalvis)
XX
CL   All alpha proteins
XX
FO   Globin-like
XX
SF   Globin-like
XX
FA   Globins
XX
DO   Hemoglobin I
XX
NC   1
XX
CN   [1]
XX
CH   a CHAIN; . START; . END;
//
ID   D3SDHB_
XX
EN   3SDH
XX
OS   Ark clam (Scapharca inaequivalvis)
XX
CL   All alpha proteins
XX
FO   Globin-like
XX
SF   Globin-like
XX
FA   Globins
XX
DO   Hemoglobin I
XX
NC   1
XX
CN   [1]
XX
CH   b CHAIN; . START; . END;
//
     _________________________________________________________________
   
Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0.
   
Known bugs

   None.
   
See also

   Program name Description
   cutgextract Extract data from CUTG
   domainer Build domain coordinate files
   nrscope Converts redundant EMBL-format SCOP file to non-redundant one
   pdbtosp Convert raw swissprot:pdb equivalence file to embl-like format
   printsextract Extract data from PRINTS
   prosextract Builds the PROSITE motif database for patmatmotifs to
   search
   rebaseextract Extract data from REBASE
   scopparse Reads raw-, and writes EMBL-like, scop classification files
   seqnr Converts redundant database results to a non-redundant set of
   hits
   tfextract Extract data from TRANSFAC
   
Author(s)

   This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)
   
History

   Written (Jan 2001) - Jon Ison.
   
Target users

   This program is intended to be run by EMBOSS site maintainers or those
   responsible for setting up and maintaining protein 3D structural data
   for use by others.
   
Comments
