
                             EMBOSS: antigenic
     _________________________________________________________________
   
                               Program antigenic
                                       
Function

   Finds antigenic sites in proteins
   
Description

   Antigenic predicts potentially antigenic regions of a protein
   sequence, using the method of Kolaskar and Tongaonkar.
   
   Analysis of data from experimentally determined antigenic sites on
   proteins has revealed that the hydrophobic residues Cys, Leu and Val,
   if they occur on the surface of a protein, are more likely to be a
   part of antigenic sites. A semi-empirical method which makes use of
   physicochemical properties of amino acid residues and their
   frequencies of occurrence in experimentally known segmental epitopes
   was developed by Kolaskar and Tongaonkar to predict antigenic
   determinants on proteins. Application of this method to a large number
   of proteins has shown that their method can predict antigenic
   determinants with about 75% accuracy which is better than most of the
   known methods. This method is based on a single parameter and thus
   very simple to use.
   
Usage

   Here is a sample session with antigenic.

% antigenic
Finds antigenic sites in proteins
Input sequence: sw:act1_fugru
Minimum length [6]:
Output file [act1_fugru.antigenic]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
   -minlen             integer    Minimum length
  [-outfile]           report     (no help text) report value

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   -minlen Minimum length Integer from 1 to 50 6
   [-outfile]
   (Parameter 2) (no help text) report value Report file
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   The input sequence can be one or more protein sequences.
   
Output file format

   The output is a standard EMBOSS report file.
   
   The results can be output in one of several styles by using the
   command-line qualifier -rformat xxx, where 'xxx' is replaced by the
   name of the required format. The available format names are: embl,
   genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel,
   feattable, motif, regions, seqtable, simple, srs, table, tagseq
   
   See:
   http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for
   further information on report formats.
   
   By default antigenic writes a 'motif' report file.
   
   The output from the above example is:
     _________________________________________________________________
   
########################################
# Program: antigenic
# Rundate: Mon Feb 11 12:01:10 2002
# Report_file: act1_fugru.antigenic
########################################

#=======================================
#
# Sequence: ACT1_FUGRU     from: 1   to: 375
# HitCount: 18
#=======================================

Max_score_pos at "*"

(1) Score 1.207 length 9 at residues 214->222
               *
 Sequence: EKLCYVALD
           |       |
         214       222

(2) Score 1.187 length 15 at residues 131->145
                 *
 Sequence: AMYVAIQAVLSLYAS
           |             |
         131             145

(3) Score 1.166 length 8 at residues 5->12
              *
 Sequence: IAALVVDN
           |      |
           5      12

(4) Score 1.164 length 12 at residues 27->38
                *
 Sequence: PRAVFPSIVGRP
           |          |
          27          38

(5) Score 1.136 length 24 at residues 160->183
                        *
 Sequence: THTVPIYEGYALPHAILRLDLAGR
           |                      |
         160                      183

(6) Score 1.135 length 6 at residues 367->372
                *
 Sequence: PSIVHR
           |    |
         367    372

(7) Score 1.116 length 16 at residues 93->108
                     *
 Sequence: ELRVAPEEHPVLLTEA
           |              |
          93              108

(8) Score 1.113 length 7 at residues 295->301
            *
 Sequence: ANTVLSG
           |     |
         295     301

(9) Score 1.110 length 11 at residues 256->266
                   *
 Sequence: RCPEALFQPSF
           |         |
         256         266

(10) Score 1.107 length 17 at residues 336->352
                      *
 Sequence: KYSVWIGGSILASLSTF
           |               |
         336               352

(11) Score 1.102 length 15 at residues 62->76
                 *
 Sequence: RGILTLKYPIEHGIV
           |             |
          62             76

(12) Score 1.086 length 19 at residues 232->250
                        *
 Sequence: SSSSLEKSYELPDGQVITI
           |                 |
         232                 250

(13) Score 1.083 length 6 at residues 327->332
              *
 Sequence: IKIIAP
           |    |
         327    332

(14) Score 1.074 length 7 at residues 317->323
              *
 Sequence: ITALAPS
           |     |
         317     323

(15) Score 1.068 length 7 at residues 186->192
                *
 Sequence: TDYLMKI
           |     |
         186     192

(16) Score 1.066 length 7 at residues 40->46
              *
 Sequence: HQGVMVG
           |     |
          40     46

(17) Score 1.045 length 7 at residues 269->275
           *
 Sequence: MESCGIH
           |     |
         269     275

(18) Score 1.034 length 7 at residues 51->57
            *
 Sequence: DSYVGDE
           |     |
          51     57


#---------------------------------------
#---------------------------------------
     _________________________________________________________________
   
   By using the '-rformat gff' qualifier, a GFF file of the predicted
   regions can be produced. For example:
   
% antigenic -rformat gff
Finds antigenic sites in proteins
Input sequence(s): sw:act1_fugru
Minimum length [6]:
Output file [act1_fugru.antigenic]:

% more act1_fugru.antigenic
##gff-version 2.0
##date 2002-02-11
##Type Protein ACT1_FUGRU
ACT1_FUGRU      antigenic       site    214     222     1.207   +       .
Sequence "ACT1_FUGRU.1" ; note "*pos 218"
ACT1_FUGRU      antigenic       site    131     145     1.187   +       .
Sequence "ACT1_FUGRU.2" ; note "*pos 137"
ACT1_FUGRU      antigenic       site    5       12      1.166   +       .
Sequence "ACT1_FUGRU.3" ; note "*pos 8"
ACT1_FUGRU      antigenic       site    27      38      1.164   +       .
Sequence "ACT1_FUGRU.4" ; note "*pos 32"
ACT1_FUGRU      antigenic       site    160     183     1.136   +       .
Sequence "ACT1_FUGRU.5" ; note "*pos 173"
ACT1_FUGRU      antigenic       site    367     372     1.135   +       .
Sequence "ACT1_FUGRU.6" ; note "*pos 372"
ACT1_FUGRU      antigenic       site    93      108     1.116   +       .
Sequence "ACT1_FUGRU.7" ; note "*pos 103"
ACT1_FUGRU      antigenic       site    295     301     1.113   +       .
Sequence "ACT1_FUGRU.8" ; note "*pos 296"
ACT1_FUGRU      antigenic       site    256     266     1.110   +       .
Sequence "ACT1_FUGRU.9" ; note "*pos 264"
ACT1_FUGRU      antigenic       site    336     352     1.107   +       .
Sequence "ACT1_FUGRU.10" ; note "*pos 347"
ACT1_FUGRU      antigenic       site    62      76      1.102   +       .
Sequence "ACT1_FUGRU.11" ; note "*pos 68"
ACT1_FUGRU      antigenic       site    232     250     1.086   +       .
Sequence "ACT1_FUGRU.12" ; note "*pos 245"
ACT1_FUGRU      antigenic       site    327     332     1.083   +       .
Sequence "ACT1_FUGRU.13" ; note "*pos 330"
ACT1_FUGRU      antigenic       site    317     323     1.074   +       .
Sequence "ACT1_FUGRU.14" ; note "*pos 320"
ACT1_FUGRU      antigenic       site    186     192     1.068   +       .
Sequence "ACT1_FUGRU.15" ; note "*pos 191"
ACT1_FUGRU      antigenic       site    40      46      1.066   +       .
Sequence "ACT1_FUGRU.16" ; note "*pos 43"
ACT1_FUGRU      antigenic       site    269     275     1.045   +       .
Sequence "ACT1_FUGRU.17" ; note "*pos 269"
ACT1_FUGRU      antigenic       site    51      57      1.034   +       .
Sequence "ACT1_FUGRU.18" ; note "*pos 52"

Data files

   Antigenic uses a data file called Eantigenic.dat.
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by EMBOSS
   environment variable EMBOSS_DATA.
   
   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
   Here is the default Eantigenic.dat file:

#                                               Antigenic  Surface  Antigenic
# Amino     -- Occurrence of amino acids in --   frequency frequency propensity
# Acid       Epitopes      Surface     Protein   f(Ag)    f(s)      A(p)
  A             135          328         524     0.065    0.061     1.064
  C              53           97         186     0.026    0.018     1.412
  D             118          352         414     0.057    0.066     0.866
  E             132          401         499     0.064    0.075     0.851
  F              76          180         365     0.037    0.034     1.091
  G             116          343         487     0.056    0.064     0.874
  H              59          138         191     0.029    0.026     1.105
  I              86          193         437     0.042    0.036     1.152
  K             158          439         523     0.076    0.082     0.930
  L             149          308         684     0.072    0.058     1.250
  M              23           72         152     0.011    0.013     0.826
  N              94          313         407     0.045    0.058     0.776
  P             135          328         411     0.065    0.061     1.064
  Q              99          252         332     0.048    0.047     1.015
  R             106          314         394     0.051    0.058     0.873
  S             168          429         553     0.081    0.080     1.012
  T             141          401         522     0.068    0.075     0.909
  V             128          239         515     0.062    0.045     1.383
  W              19           55         103     0.009    0.010     0.893
  Y              71          158         245     0.034    0.029     1.161
Total          2066         5340        7944

Notes

References

    1. Kolaskar,AS and Tongaonkar,PC (1990). A semi-empirical method for
       prediction of antigenic determinants on protein antigens. FEBS
       Letters 276: 172-174.
    2. Parker,JMR, Guo,D and Hodges,RS (1986). Biochemistry 25:
       5425-5432.
       
Warnings

   The program will warn you if the sequence is not a protein or has
   ambiguity codes.
   
Diagnostic Error Messages

Exit status

   It exits with status 0, unless a region is badly constructed.
   
Known bugs

   None.
   
See also

    Program name                        Description
   digest         Protein proteolytic enzyme or reagent cleavage digest
   fuzzpro        Protein pattern search
   fuzztran       Protein pattern search after translation
   helixturnhelix Report nucleic acid binding motifs
   oddcomp        Finds protein sequence regions with a biased composition
   patmatdb       Search a protein sequence with a motif
   patmatmotifs   Search a PROSITE motif database with a protein sequence
   pepcoil        Predicts coiled coil regions
   preg           Regular expression search of a protein sequence
   pscan          Scans proteins using PRINTS
   sigcleave      Reports protein signal cleavage sites
   
Author(s)

   This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
   
   Original program "ANTIGENIC" by Peter Rice (EGCG 1991)
   
History

   Completed 9th March 1999
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
