
                               EMBOSS: banana
     _________________________________________________________________
   
                                Program banana
                                       
Function

   Bending and curvature plot in B-DNA
   
Description

   banana predicts bending of a normal (B) DNA double helix, using the
   method of Goodsell & Dickerson, NAR 1994 11;22(24):5497-5503.
   
   This program calculates the magnitude of local bending and macroscopic
   curvature at each point along an arbitrary B-DNA sequence, using any
   desired bending model that specifies values of twist, roll and tilt as
   a function of sequence.
   
   The data, based on the nucleosome positioning data of Satchwell et al
   1986 (J. Mol. Biol. 191, 659-675), correctly predicts experimental
   A-tract curvature as measured by gel retardation and cyclization
   kinetics and successfully predicts curvature in regions containing
   phased GGGCCC sequences. (This is the model 'a' described in the
   Goodsell & Dickerson paper).
   
   This model - showing local bending at mixed sequence DNA, strong bends
   at the sequence GGC, and straight, rigid A-tracts - is the only model,
   out of six models investigated in Goodsell & Dickerson paper, that is
   consistent with both solution data from gel retardation and
   cyclization kinetics and structural data from x-ray crystallography.
   
   The consensus sequence for DNA bending is 5 As and 5 non-As
   alternating. "N" is an ambiguity code for any base, and "B" is the
   ambiguity code for "not A" so "BANANA" is itself a bent sequence -
   hence the name of this program.
   
   The program outputs both a graphical display and a text file of the
   results.
   
  Background
  
   Sequence-dependent DNA bending, like sequence-dependent prtoein
   folding, is a problem taht remains frustratingly elusive. The issue
   has obvious biological importance in such matters as the winding of
   DNA in nucleosomes, or the recognition of particular DNA loci by
   restriction enzymes, repressors and other control proteins. the
   binding of the catabolite gene activator protein and of the TATA-box
   recognition protein to a double DNA helix are only two spectacular
   examples in which major bends in the helix are induced at specific
   sequence loci. It is of interest to consider whether the particular
   recognition sequences are bent even in the absence of proteins: a
   preformed bend in the DNA would form a custom site for protein
   binding, or an enhanced bendability of a given sequence would
   facilitate protein-induced bending.
   
   Two possible models of sequence-dependent bending in free DNA have
   been proposed in the past. Nearest neighbor models propose that
   large-scale measurable curvature may arise by the accumulation of many
   small local deformations in helical twist, roll, tilt and slide at
   individual steps between base pairs. junction models, on the other
   hand, propose that bending occurs at the interface between two
   different structural variants of the B-DNA double helix. Note that in
   both of these models, sequences which are anisotropically bendable -
   for instance, sequences with steps that preferentially bend only to
   compress the major groove - will lead to an average structure which is
   similar to a sequence with a rigid, intrinsic bend. The Goodsell &
   Dickerson paper does not distinguish between these two possibilities.
   
   B-DNA has the special property of having its base pairs very nearly
   perpendicular to the overall helix axis. Hence the normal vector to
   each base pair can be taken as representing the local helix at that
   point, and curvature and bending can be studied simply by observing
   the behaviour of the normal vectors from one base to another along the
   helix. This is both easy to calculate and simple to interpret. This
   program display the magnitude of bending and curvature at each point
   along the sequence. It is not intended as a substitute for more
   elaborate three-dimensional trajectory calculations, but only to
   express bending tendencies as a function of sequence. The power of
   this simple appraoch is in its ease of screening for regions of a
   given DNA sequence where phased local bends add constructively to form
   an overall curve.
   
   For purposes of clarity the terms bending and curvature will be used
   in a restricted sense here. Bending of DNA describes the tendency for
   successive base pairs to be non-parallel in an additive manner over
   several base pair steps. Bending most commonly is produced by a
   rolling of adjacent base pairs over one another about thir long axis,
   although in principle, tilting of base pairs about their short axis
   could make a contribution. In contrast curvature of DNA represents the
   tendency of the helix axis to follow a non-linear pathway over an
   appreciable length, in a manner that contributes to macroscopic
   behaviour such as gel retardation or ease of cyclization into DNA
   minicircles. The distinction between local bending and macroscopic
   curvature is illustrated (poorly) in the following figure (see figure
   1 of the Goodsell & Dickerson paper for a better view).
   
                       bend   bend   bend
                         -     -     -
  uncurved              / \   / \   / \
                  -----/   \-/   \-/   \-----
                          bend   bend




                    bend    bend
                     /-------\
                   /          \
  curved          |bend        |bend
                  |            |
                  |            |


   An x-ray crystal structure analysis cannot show curvature, but can and
   often does show local bending. On the other hand gel electrophoresis
   and cyclization kinetics can detect macroscopic curvature, but not
   bending. A complete knowledge of local bending would permit the
   precise calculation of curvature, but a knowledge of macroscopic
   curvature alone does not allow one to specify precisely the local
   bending elements that produce it. This is one of the scale paradoxes
   that have plagued the DNA conformation field for a decade or more.
   There is more than a passing resemblence to a familiar problem of
   classical statistical mechanics: A complete knowledge of instantaneous
   positions and velocities of all molecules of a gas allows one to
   calculate bulk properties such as temprature, pressure and volume. But
   the most detailed knowledge of bulk properties cannot lead one to
   precise molecular positions. Many molecular arrangements can produce
   identical bulk properties, and in the present case, many bending
   combinations can produce identical macroscopic curvature.
   
  Method
  
   The program reads a sequence and a matrix of standard twist, roll and
   tilt angles for each type of base pair step. This matrix is entirely
   at the disposal of the user, and can be altered to represent any other
   DNA-bending model. The program creates a table or a graphical image of
   the bending and the curvature at each base step.
   
   The program begins by applying the indicated twist, roll and tilt at
   each step along the sequence, and calculating the resulting base pair
   normal vector. The first base pair is aligned normal to the z axis,
   with a twist value of 0.0 degrees. the specified twist is applied to
   the second base pair, and roll and tilt values are use dto calculate
   its normal vector relative to the first. If either roll or tilt is
   non-zero, the new normal vector will be angled away from the z axis,
   producing the first 'bend'. the process is continued along the
   sequence, applying the appropriate twist, roll and tilt to each new
   base pair relative to its predecessor. The result is a list of normal
   vectors for all base pairs in the sequence.
   
   Local bends are then calculated from the normal vectors. The bend for
   base N is calculated across a window from N-1 to N+1.
   
   Curvature is calculated in two steps. Base pair normals are first
   averaged over a 10-base-pair window to filter out the local writhing
   of the helix. The normals of the nine base pairs from N-4 to N+4, and
   the two base pairs N-5 and N+5 at half weight, are averaged and
   assigned to base pair N. Curvature then is calculated from these
   averaged normal vector values, using a bracket value, nc, with a value
   of 15. That is, the curvature at base pair N is the angle between
   averaged normal vectors at base pairs N-nc and N+nc.
   
Usage

   Here is a sample session with banana.
   
% banana embl:rnu68037
Bending and curvature plot in B-DNA
Graph type [x11]:

   click here for result (1st page only)
   
   Bending is shown in a solid line, curvature is show in a dotted line.
   
Command line arguments

   Mandatory qualifiers (* if not always prompted):
  [-sequence]          sequence   Sequence USA
*  -graph              graph      Graph type

   Optional qualifiers:
   -anglesfile         datafile   angles file
   -residuesperline    integer    Number of residues to be displayed on each
                                  line
   -outfile            outfile    Output file name

   Advanced qualifiers:
   -data               bool       Output as data

   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence USA Readable sequence Required
   -graph Graph type EMBOSS has a list of known devices, including
   postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows,
   x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
   png EMBOSS_GRAPHICS value, or x11
   Optional qualifiers Allowed values Default
   -anglesfile angles file Data file Eangles_tri.dat
   -residuesperline Number of residues to be displayed on each line Any
   integer value 50
   -outfile Output file name Output file banana.profile
   Advanced qualifiers Allowed values Default
   -data Output as data Yes/No No
   
Input file format

   Any DNA sequence USA.
   
Output file format

   The output is to both a graphical display and to a text file with the
   default name 'banana.profile'.
   
   The graphical display shows the sequence together with the local local
   bending (solid line) and macroscopic curvature (dotted line).
   
   The output data file from the above example is as follows:
     _________________________________________________________________
   
t    0.0      0.0
g   17.7      0.0
a   21.1      0.0
g   28.5      0.0
c   26.2      0.0
c   19.7      0.0
c   18.7      0.0
c   12.5      0.0
t    9.7      0.0
a   14.9      0.0
c   16.5      0.0
g   17.5      0.0
g   26.2      0.0
g   28.5      0.0
c   20.7      0.0
t   11.7      0.0
t    6.4      0.0
a    9.3      0.0
a   14.9      0.0
c   17.7      0.0
c   15.7     19.2
t   15.7     18.5
g   17.7     17.9
a   21.1     17.1
g   28.5     15.9
c   25.2     14.6
c   12.5     13.3
t    7.2     11.9
a   13.2     10.8
g   20.1     10.1
t   19.5      9.6
g   15.1      9.2
g   14.9      9.1
a   19.5      9.5
t   19.7     10.2
g   17.7     10.8
a   17.7     11.0
g   25.2     11.2
g   26.2     11.3
c   15.3     11.5
a   11.4     11.7
a   14.5     12.0
c   13.9     12.2
a   11.4     12.3
[output truncated for brevity]
     _________________________________________________________________
   
   The data file consists of three columns separated by blanks or tab
   characters.
   
   The first column is the sequence.
   The second column is the local bending.
   The third is the curvature.
   
Data files

   It reads in angles files for the twist, roll and tilt angles. By
   default Eangles_tri.dat is used, as in Goodsell & Dickerson, NAR 1994
   11;22(24):5497-503 and Drew and Travers (1986) JMB 191, 659
   
   The description of this bending model is as follows:
   
   The roll-tilt-twist parameters of this model are derived purely from
   experimental observations of sequence location preferences of base
   trimers in small circles of DNA, without reference to solution
   techniques that measure curvature per se. For this reason, they may be
   the most objective and unbiased parameters of all. Satchwell, Drew and
   Travers studied the positioning of DNA sequences wrappped around
   nucleosome cores, and in closed circles of double-helical DNA of
   comparable size. From the sequence data they calculated a fractional
   preference of each base pair triplet for a position 'facing out', or
   with the major groove on the concave side of the curved helix. The
   sequence GGC, for example, has a 45% preference for locations on a
   bent double helix in which its major groove faces inward and is
   compressed by the curvature (tending towards positive roll), whereas
   sequence AAA has a 36% preference for the opposite orientation, with
   major groove facing outward and with minor groove facing inward and
   compressed (tending toward negative roll). These fractional variances
   have been converted into roll angles in the following manner: Because
   x-ray cyrstal structure analysis uniformly indicates that AA steps are
   unbent, a zero roll is assigned to the AAA triplet; an arbitrary
   maximum roll of 10 degrees is asigned to GGC, and all other triplets
   are scaled in a lenear manner. Where % is the percent-out figure,
   then:
   
         Roll = 10 degrees * (% + 36)/(45 + 36)

   Chenging the maximum roll value will scale the entire profile up or
   down proportionately, but will not change the shape of the profile.
   Peaks will remain peaks, and valleys, valleys. The absolute magnitide
   of all the roll values is less important than their relative
   magnitude, or the order of roll preference. Twist angles were set to
   zero. Because these values correspond to base trimers, the values of
   roll, tilt and twist were applied to the first two bases for the
   calculation.
   
Notes

   None.
   
References

    1. Goodsell, D.S. & Dickerson, R.E. (1994) "Bending and Curvature
       Calculations in B-DNA" Nucl. Acids. Res. 22, 5497-5503.
    2. Drew and Travers (1986) JMB 191, 659
       
Warnings

   Only ACTG allowed, if sequence contains a non ACTG character then the
   program will exit with a fatal error message.
   
Diagnostic Error Messages

   None.
   
Exit status

   0 if successful.
   
Known bugs

   None.
   
See also

   Program name                          Description
   btwisted     Calculates the twisting in a B-DNA sequence
   chaos        Create a chaos game representation plot for a sequence
   compseq      Counts the composition of dimer/trimer/etc words in a sequence
   dan          Calculates DNA RNA/DNA melting temperature
   freak        Residue/base frequency table or plot
   isochore     Plots isochores in large DNA sequences
   wordcount    Counts words of a specified size in a DNA sequence
   
Author(s)

   This application was written by Ian Longden (il@sanger.ac.uk)
   Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus,
   Hinxton, Cambridge, CB10 1SA, UK.
   
History

The original program ('BEND') is described in the Goodsell & Dickerson paper.

Created 1999/06/09.
Last Updated 1999/06/14.

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
