
                               EMBOSS: chips
     _________________________________________________________________
   
                                 Program chips
                                       
Function

   Codon usage statistics
   
Description

   chips calculates Frank Wright's Nc statistic for the effective number
   of codons used (ref 1).
   
   This is a simple measure that quantifies how far the codon usage of a
   gene departs from equal usage of synonymous codons. This measure of
   synonymous codon usage bias, the 'effective number of codons used in a
   gene', Nc, can be easily calculated from codon usage data alone, and
   is independent of gene length and amino acid (aa) composition. Nc can
   take values from 20, in the case of extreme bias where one codon is
   exclusively used for each aa, to 61 when the use of alternative
   synonymous codons is equally likely. Nc thus provides an intuitively
   meaningful measure of the extent of codon preference in a gene.
   
   The Nc statistic has problems in very short sequences (20 amino acids
   or less) which are yet to be fully resolved. They are caused by the
   need to consider amino acids which are missing in the sequence.
   
   This calculation was originally in the EGCG package as "codfish"
   (codon usage for fission yeast). As Frank Wright is a vegan, we looked
   for a meat-free name for the EMBOSS version, "chips". The official
   explanation is "Codon Heterozygosity (Inverse of) in a Protein-coding
   Sequence"
   
Usage

   Here is a sample session with chips. If the sequence extends beyond
   the coding region then the start and/or end positions of the CDS must
   be provided because chips analyses exclusively protein coding regions.
   
% chips -sbeg 135 -send 1292
Input sequence: embl:paamir
Output file [paamir.chips]:

Command line arguments

   Mandatory qualifiers:
  [-seqall]            seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers:
   -cfile              codon      Codon usage file
   -window             integer    Averaging window

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-seqall]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-outfile]
   (Parameter 2) Output file name Output file <sequence>.chips
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -cfile Codon usage file Codon usage file in EMBOSS data path Ehum.cut
   -window Averaging window Any integer value 30
   
Input file format

   A nucleic acid sequence USA.
   
Output file format

   This is the output from the example run:
     _________________________________________________________________
   
# CHIPS codon usage statistics

Nc = 32.951
     _________________________________________________________________
   
   If all codons are used, the Nc value will be 61. If only one codon is
   used for each amino acid the Nc value will be 20. Low values therefor
   indicate a strong codon bias, and high values indicate a low bias and
   possibly a non-coding region.
   
Data files

   chips reads a codon usage file but only as a template and ignores the
   original data.
   
   The codon usage table is by default the file "CODONS/Ehum.cut" in the
   EMBOSS distribution directory.
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by EMBOSS
   environment variable EMBOSS_DATA.
   
   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
Notes

   None.
   
References

    1. Wright, F. (1990) Gene 87:23-29 "The 'effective number of codons'
       used in a gene."
       
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with a status of 0.
   
Known bugs

   None.
   
See also

   Program name                  Description
   cai          CAI codon adaptation index
   codcmp       Codon usage table comparison
   cusp         Create a codon usage table
   syco         Synonymous codon usage Gribskov statistic plot
   
Author(s)

   This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
   
History

   1999 - Written - Alan Bleasby.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
