
                              EMBOSS: contacts
     _________________________________________________________________
   
                               Program contacts
                                       
Function

   Reads coordinate files and writes contact files
   
Description

   contacts parses an embl-like clean coordinate files generated by the
   coorde application (not currently in emboss, email Jon Ison
   jison@hgmp.mrc.ac.uk) or the domainer application and writes, for each
   file in a given directory, files of residue-residue contact data in
   embl-like format. Each of these files contains residue contact data
   for each chain of every model in the coordinate file (or the single
   scop domain in the case where domainer output is read).
   
   Contact between two residues is defined as when the van der Waals
   surface of any atom of the first residue comes within the threshold
   contact distance of the van der Waals surface of any atom of the
   second residue. The threshold contact distance is a user-defined
   distance with a default value of 1 Angstrom.
   
   The following van der Waals radii are used:
   
C:1.8 Angstrom
O:1.4 Angstrom
N:1.7 Angstrom
S:2.0 Angstrom
H:1.0 Angstrom (default for other or unknown atom types)

Usage

   Here is a sample session with contacts:

% contacts
Reads coordinate files and writes contact files
Location of coordinate files for input (embl-like format) [./]:
Extension of coordinate files (embl-like format) [.pxyz]:
Location of contact files for output [./]:
Extension of contact files [.con]:
Threshold contact distance [1.0]:
Name of data file with van der Waals radii [Evdw.dat]:
Name of log file for the build [contacts.log]:

Command line arguments

   Mandatory qualifiers:
  [-cpdb]              string     Location of coordinate files for input
                                  (embl-like format)
  [-cpdbextn]          string     Extension of coordinate files (embl-like
                                  format)
  [-thresh]            float      Threshold contact distance
  [-ignore]            float      If any two atoms from two different residues
                                  are at least this distance apart then no
                                  futher inter-atomic contacts will be checked
                                  for for that residue pair . This speeds the
                                  calculation up considerably.
  [-vdwf]              string     Name of data file with van der Waals radii
  [-con]               string     Location of contact files for output
  [-conextn]           string     Extension of contact files
  [-conerrf]           outfile    Name of log file for the build

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-cpdb]
   (Parameter 1) Location of coordinate files for input (embl-like
   format) Any string is accepted ./
   [-cpdbextn]
   (Parameter 2) Extension of coordinate files (embl-like format) Any
   string is accepted .pxyz
   [-thresh]
   (Parameter 3) Threshold contact distance Any integer value 1.0
   [-ignore]
   (Parameter 4) If any two atoms from two different residues are at
   least this distance apart then no futher inter-atomic contacts will be
   checked for for that residue pair . This speeds the calculation up
   considerably. Any integer value 20.0
   [-vdwf]
   (Parameter 5) Name of data file with van der Waals radii Any string is
   accepted Evdw.dat
   [-con]
   (Parameter 6) Location of contact files for output Any string is
   accepted ./
   [-conextn]
   (Parameter 7) Extension of contact files Any string is accepted .con
   [-conerrf]
   (Parameter 8) Name of log file for the build Output file contacts.log
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   It reads in an embl-like clean coordinate files generated by the
   coorde application or the domainer application.
   
   For example:
     _________________________________________________________________
   
ID   D1HBBA_
XX
DE   Co-ordinates for SCOP domain D1HBBA_
XX
OS   See Escop.dat for domain classification
XX
EX   METHOD xray; RESO 1.90; NMOD 1; NCHA 1;
XX
CN   [1]
XX
IN   ID A; NR 141; NH 0; NW 0;
XX
SQ   SEQUENCE   141 AA;  15127 MW;  5EC7DB1E CRC32;
     VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
     KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
     VHASLDKFLA SVSTVLTSKY R
XX
CO   1    1    P    1     1     V    VAL    N      7.155   17.725 4.424     1.0
0
    37.82
CO   1    1    P    1     1     V    VAL    CA     7.854   18.800 3.718     1.0
0
    35.10
CO   1    1    P    1     1     V    VAL    C      9.366   18.565 3.754     1.0
0
    31.92
CO   1    1    P    1     1     V    VAL    O      9.861   17.961 4.721     1.0
0
    35.01
CO   1    1    P    1     1     V    VAL    CB     7.529   20.168 4.360     1.0
0
    47.63
CO   1    1    P    1     1     V    VAL    CG1    7.806   21.300 3.369     1.0
0
    62.84
CO   1    1    P    1     1     V    VAL    CG2    6.136   20.244 4.936     1.0
0
    54.85
CO   1    1    P    2     2     L    LEU    N     10.032   19.062 2.731     1.0
0
    27.38
CO   1    1    P    2     2     L    LEU    CA    11.496   18.967 2.657     1.0
0
    23.24
CO   1    1    P    2     2     L    LEU    C     12.077   20.110 3.496     1.0
0
    22.99
CO   1    1    P    2     2     L    LEU    O     11.672   21.259 3.289     1.0
0
    25.22
     _________________________________________________________________
   
Output file format

   The embl-like format used for the contact files uses the following
   records:
   
    1. ID - either the 4-character PDB identifier code (where clean
       protein coordinate files are used as input) or the 7-character
       domain identifier code taken from scop (where domain coordinate
       files were used as input; see documentation for the EMBOSS
       application scope for further info.)
    2. DE - bibliographic information. The text "Residue-residue contact
       data" is always given.
    3. EX - experimental information. The value of the threshold contact
       distance is given as a floating point number after 'THRESH'. The
       number of models and number of polypeptide chains are given after
       'NMOD' and 'NCHA' respectively. domain coordinate files a 1 is
       always given. Following the EX record, the file will have a
       section containing a CN, IN and SM records (see below) for each
       chain. The sections for each chain of a model are given after the
       MO record.
    4. MO - model number. The number given in brackets after this record
       indicates the start of a section of model-specific data.
    5. CN - chain number. The number given in brackets after this record
       indicates the start of a section of chain-specific data.
    6. IN - chain specific data. The character given after ID is the PDB
       chain identifier taken from the input file, (a '.` given in cases
       where a chain identifier was not specified in the original pdb
       file or, for domain coordinate files, the domain is comprised of
       more than one domain). The number of amino acid residues
       comprising the chain (or the chains from which a domain is
       comprised) is given after NR. The number of residue-residue
       contacts is given after NSMCON.
    7. SM - Line of residue contact data. Pairs of amino acid identifiers
       and residue numbers are delimited by a ';'. Residue numbers are
       taken from the clean coordinate file and give a correct index into
       the sequence (i.e. they are not necessarily the same as the
       original pdb file).
    8. XX - used for spacing.
    9. // - given on the last line of the file only.
       
   Note - SM records are used for contacts between either either
   side-chain or main-chain atoms as defined above. In a future
   implementation, SS will be used for side-chain only contacts, MM will
   be used for main-chain only contacts, and there will probably be
   several other forms of contact too.
   
   Example contacts output file:
     _________________________________________________________________
   
ID   D1HBBB_
XX
DE   Residue-residue side-chain contact data
XX
EX   THRESH 10.0; NMOD 1; NCHA 1;
XX
MO   [1]
XX
CN   [1]
XX
IN   ID B; NR 146; NSMCON 2514;
XX
SM   VAL 1 ; HIS 2
SM   VAL 1 ; LEU 3
SM   VAL 1 ; THR 4
SM   VAL 1 ; PRO 5
SM   VAL 1 ; GLU 6
SM   VAL 1 ; GLU 7
SM   VAL 1 ; LYS 8
SM   VAL 1 ; VAL 11
SM   VAL 1 ; PHE 71
//
     _________________________________________________________________
   
   contacts generates a log file an excerpt of which is shown below. If
   there is a problem in processing a coordinate file, three lines
   containing the record '//', the scop domain or pdb identifier code and
   an error message respectively are written. The text 'WARN file open
   error filename', 'ERROR file read error filename' or 'ERROR file write
   error filename ' will be reported when an error was encountered during
   a file open, read or write respectively. Various other error messages
   may also be given (in case of difficulty email Jon Ison,
   jison@hgmp.mrc.ac.uk).
   
   Example log file
     _________________________________________________________________
   
//
DS002__
WARN  Could not open for reading cpdb file s002.pxyz
//
DS003__
WARN  Could not open for reading cpdb file s003.pxyz
     _________________________________________________________________
   
Data files

   contacts reads in data on van der Waals radii for atoms in proteins
   from the data file Evdw.dat (by default).
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by the EMBOSS
   environment variable EMBOSS_DATA.
   
   To see the available EMBOSS data files, run:
   
% embossdata -showall

   To fetch one of the data files (for example 'Exxx.dat') into your
   current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat

   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0.
   
Known bugs

   None.
   
See also

   Program name Description
   dichet Parse dictionary of heterogen groups
   interface Reads coordinate files and writes inter-chain contact files
   psiblasts Runs PSI-BLAST given scopalign alignments
   scopalign Generate alignments for SCOP families
   seqsort Removes ambiguities from a set of hits resulting from a
   database search
   siggen Generates a sparse protein signature
   sigscan Scans a sparse protein signature against swissprot
   
Author(s)

   This application was written by Jon Ison (jison@hgmp.mrc.ac.uk)
   
History

   Written (June 2001) - Jon Ison
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
