
                              EMBOSS: garnier
     _________________________________________________________________
   
                                Program garnier
                                       
Function

   Predicts protein secondary structure
   
Description

   This is an implementation of the original Garnier Osguthorpe Robson
   algorithm (GOR I) for predicting protein secondary structure.
   
   Secondary structure prediction is notoriously difficult to do
   accurately. The GOR I alogorithm is one of the first semi-successful
   methods.
   
   The Garnier method is not regarded as the most accurate prediction,
   but is simple to calculate on most workstations.
   
   The accuracy of any secondary structure prediction program is not much
   better than 70% to 80% at best. This is an early algorithm and will
   probably not predict with much better than about 65% accuracy.
   
   The Web servers for PHD, DSC, and others are generally preferred.
   
   Do not rely on this (or any other) program alone to make your
   predictions with. Use several programs and take a consensus of the
   results.
   
Usage

   Here is a sample session with garnier.

% garnier
Input sequence: sw:amic_pseae
Output file [amic_pseae.garnier]:

Command line arguments

   Mandatory qualifiers:
  [-sequencea]         seqall     Sequence database USA
  [-outfile]           report     (no help text) report value

   Optional qualifiers: (none)
   Advanced qualifiers:
   -idc                integer    idc param

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequencea]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-outfile]
   (Parameter 2) (no help text) report value Report file
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   -idc idc param Integer from 0 to 6 0
   
   The meaning and use of the parameter 'idc' is currently being
   investigated. The original author, Bill Pearson writes:
   
   "In their paper, GOR mention that if you know something about the
   secondary structure content of the protein you are analyzing, you can
   do better in prediction. "idc" is an index into a set of arrays,
   dharr[] and dsarr[], which provide "decision constants" (dch, dcs),
   which are offsets that are applied to the weights for the helix and
   sheet (extend) terms. So, idc=0 says don't use the decision constant
   offsets, and idc=1 to 6 indicates that various combinations of dch,dcs
   offsets should be used. I don't remember what they are, but I must
   have gotten the values from their paper."
   
Input file format

   Any protein sequence.
   
Output file format

   The output is a standard EMBOSS report file.
   
   The results can be output in one of several styles by using the
   command-line qualifier -rformat xxx, where 'xxx' is replaced by the
   name of the required format. The available format names are: embl,
   genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel,
   feattable, motif, regions, seqtable, simple, srs, table, tagseq
   
   See:
   http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for
   further information on report formats.
   
   By default garnier writes a 'tagseq' report file.
   
   Here is the output from the example run.
  __________________________________________________________________________

########################################
# Program: garnier
# Rundate: Mon Feb 11 13:42:25 2002
# Report_file: amic_pseae.garnier
########################################

#=======================================
#
# Sequence: AMIC_PSEAE     from: 1   to: 384
# HitCount: 113
#
# DCH = 0, DCS = 0
#
#  Please cite:
#  Garnier, Osguthorpe and Robson (1978) J. Mol. Biol. 120:97-120
#
#
#
#=======================================


          .   10    .   20    .   30    .   40    .   50
      GSHQERPLIGLLFSETGVTADIERSQRYGALLAVEQLNREGGVGGRPIET
helix                    HHHHH        HHHHH
sheet      EE EEEEE                 EE              EEEE
turns        T                TTTT          TTTT
 coil CCCCC        CCCCCC         CC       C    CCCC

          .   60    .   70    .   80    .   90    .  100
      LSQDPGGDPDRYRLCAEDFIRNRGVRFLVGCYMSHTRKAVMPVVERADAL
helix               HHHHHH            HHHH H     HHHHHH
sheet E         EEEE           EEEE          EEEE      E
turns  TT TT   T          TTTTT    TTT    T T
 coil    C  CCC

          .  110    .  120    .  130    .  140    .  150
      LCYPTPYEGFEYSPNIVYGGPAPNQNSAPLAAYLIRHYGERVVFIGSDYI
helix                              HHH
sheet EEE    E       EE           E   EEEE    EEEEE
turns       T TTT  TT  T     TT           TT T     TTTT
 coil    CCC     CC     CCCCC  CCC          C          C

          .  160    .  170    .  180    .  190    .  200
      YPRESNHVMRHLYRQHGGTVLEEIYIPLYPSDDDVQRAVERIYQARADVV
helix       HHHH                       HHHHHHHHHHHHH
sheet           EEE       EEEEEEE                   EEEE
turns   TTT        TTT             TTTT
 coil CC   C          CCCC       CC

          .  210    .  220    .  230    .  240    .  250
      FSTVVGTGTAELYRAIARRYGDGRRPPIASLTTSEAEVAKMESDVAEGQV
helix          HHHHHHH                HHHHHHHHHHHHHHHHH
sheet EEEE            EE         EEE                   E
turns                   TTTTTT
 coil     CCCCC               CCC   CC

          .  260    .  270    .  280    .  290    .  300
      VVAPYFSSIDTAASRAFVQACHGFFPENATITAWAEAAYWQTLLLGRAAQ
helix            HHHHHHH           HHHHHHHHHHHHH    HHHH
sheet EEEE  EEE         EE                      E
turns     TT              TTT   TT
 coil          CC            CCC  C              CCC

          .  310    .  320    .  330    .  340    .  350
      AAGSWRVEDVQRHLYDICIDAPQGPVRVERQNNHSRLSSRIAEIDARGVF
helix H     HHHH                                HHH
sheet                 EEEE     EEEEE         EEE      EE
turns           TTTTTT     T        TT   T         TTT
 coil  CCCCC              C CCC       CCC CCC

          .  360    .  370    .  380
      QVRWQSPEPIRPDPYVVVHNLDDWSASMGGGALP
helix
sheet EE           EEEEEEE     E       E
turns   TT    TT           TTT   TTT
  __________________________________________________________________________

Data files

   None.
   
Notes

   The Garnier method is not regarded as the most accurate prediction,
   but is simple to calculate on most workstations.
   
   The Web servers for PHD, DSC, and others are generally preferred.
   
   Do not rely on this (or any other) program alone to make your
   predictions with. Use several programs and take a consensus of the
   results.
   
   The 3D structure for the example sequence is known, although the 2D
   structure elements were not in the SwissProt feature table for release
   38 when the test data was extracted.
   
   DSSP shows:
 From     To   Structure
    9     13   E beta sheet
   21     39   H alpha helix
   50     54   E beta sheet
   60     72   H alpha helix
   78     81   E beta sheet
   85     97   H alpha helix
  101    104   E beta sheet
  117    119   E beta sheet
  128    136   H alpha helix
  142    148   E beta sheet
  151    166   H alpha helix
  170    177   E beta sheet
  183    196   H alpha helix
  200    204   E beta sheet
  208    221   H alpha helix
  229    231   E beta sheet
  236    239   H alpha helix
  244    247   H alpha helix
  251    254   E beta sheet
  263    273   H alpha helix
  284    303   H alpha helix
  308    315   H alpha helix
  320    322   E beta sheet
  325    329   E beta sheet
  336    337   E beta sheet
  341    345   E beta sheet
  351    356   E beta sheet

References

   Garnier J, Osguthorpe DJ, Robson B Analysis of the accuracy and
   implications of simple methods for predicting the secondary structure
   of globular proteins. J Mol Biol 1978 Mar 25;120(1):97-120
   
Warnings

   The accuracy of any secondary structure prediction program is not much
   better than 70% to 80% at best. This is an early algorithm and will
   probably not predict with much better than about 65% accuracy.
   
   You are advised to use several of the latest Web-based prediction
   sites and combine them to make a consensus prediction.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exist with a status of 0.
   
Known bugs

   None.
   
See also

    Program name             Description
   helixturnhelix Report nucleic acid binding motifs
   hmoment        Hydrophobic moment calculation
   pepcoil        Predicts coiled coil regions
   pepnet         Displays proteins as a helical net
   pepwheel       Shows protein sequences as helices
   tmap           Displays membrane spanning regions
   
Author(s)

   This program ('GARNIER') was originally written by William Pearson
   (wrp@virginia.edu) and released as part of his FASTA package.
   
   This application was modified for inclusion in EMBOSS by Rodrigo Lopez
   (rls@ebi.ac.uk) European Bioinformatics Institute, Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
   
History

   None.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
