
                              EMBOSS: pepstats
     _________________________________________________________________
   
                               Program pepstats
                                       
Function

   Protein statistics
   
Description

   pepstats outputs a report of simple protein sequence information
   including:
   
     * molecular weight
     * number of residues
     * average residue weight
     * charge
     * isoelectric point
     * for each type of amino acid: number, molar percent, DayhoffStat
     * for each physico-chemical class of amino acid: number, molar
       percent
       
   DayhoffStat is the amino acid's Dayhoff statistic divided by the molar
   percent. The Dayhoff statistic is the amino acid's relative occurence
   per 1000 aa normalised to 100 by rls@ebi.ac.uk (original work from
   1993)
   
Usage

   Here is a sample session with pepstats.
   
% pepstats
Protein statistics
Input sequence: sw:laci_ecoli
Output file [laci_ecoli.pepstats]:

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         sequence   Sequence USA
   -outfile            outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -[no]termini        bool       Include charge at N and C terminus
   -aadata             string     Molecular weight data for amino acids

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequencea]
   (Parameter 1) Sequence USA Readable sequence Required
   -outfile Output file name Output file <sequence>.pepstats
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -[no]termini Include charge at N and C terminus Yes/No Yes
   -aadata Molecular weight data for amino acids Any string is accepted
   Eamino.dat
   
Input file format

   Normal protein sequence USA.
   
Output file format

   Here is the output from the example run:
     _________________________________________________________________
   
PEPSTATS of LACI_ECOLI from 1 to 360

Molecular weight = 38563.98             Residues = 360
Average Residue Weight  = 107.122       Charge   = 1.5
Isoelectric Point = 6.8820

Residue         Number          Mole%           DayhoffStat
A = Ala         44              12.222          1.421
B = Asx         0               0.000           0.000
C = Cys         3               0.833           0.287
D = Asp         17              4.722           0.859
E = Glu         15              4.167           0.694
F = Phe         4               1.111           0.309
G = Gly         22              6.111           0.728
H = His         7               1.944           0.972
I = Ile         18              5.000           1.111
K = Lys         11              3.056           0.463
L = Leu         40              11.111          1.502
M = Met         10              2.778           1.634
N = Asn         12              3.333           0.775
P = Pro         14              3.889           0.748
Q = Gln         28              7.778           1.994
R = Arg         19              5.278           1.077
S = Ser         33              9.167           1.310
T = Thr         19              5.278           0.865
V = Val         34              9.444           1.431
W = Trp         2               0.556           0.427
X = Xxx         0               0.000           0.000
Y = Tyr         8               2.222           0.654
Z = Glx         0               0.000           0.000

Property        Residues                Number          Mole%
Tiny            (A+C+G+S+T)             121             33.611
Small           (A+B+C+D+G+N+P+S+T+V)   198             55.000
Aliphatic       (I+L+V)                 92              25.556
Aromatic        (F+H+W+Y)               21               5.833
Non-polar       (A+C+F+G+I+L+M+P+V+W+Y) 199             55.278
Polar           (D+E+H+K+N+Q+R+S+T+Z)   161             44.722
Charged         (B+D+E+H+K+R+Z)         69              19.167
Basic           (H+K+R)                 37              10.278
Acidic          (B+D+E+Z)               32               8.889
     _________________________________________________________________
   
Data files

   The Dayhoff statistic is read from the EMBOSS data file
   'Edayhoff.freq'. You can inspect and modify this file by copying it
   into your current directory with the command: 'embossdata -fetch'.
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by EMBOSS
   environment variable EMBOSS_DATA.
   
   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with a status of 0.
   
Known bugs

   None.
   
See also

   Program name Description
   backtranseq Back translate a protein sequence
   charge Protein charge plot
   checktrans Reports STOP codons and ORF statistics of a protein
   sequence
   compseq Counts the composition of dimer/trimer/etc words in a sequence
   emowse Protein identification by mass spectrometry
   freak Residue/base frequency table or plot
   iep Calculates the isoelectric point of a protein
   mwfilter Filter noisy molwts from mass spec output
   octanol Displays protein hydropathy
   pepinfo Plots simple amino acid properties in parallel
   pepwindow Displays protein hydropathy
   pepwindowall Displays protein hydropathy of a set of sequences
   
Author(s)

   This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
   
History

   Written (1999) - Alan Bleasby
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
