
                            EMBOSS: backtranseq
     _________________________________________________________________
   
                              Program backtranseq
                                       
Function

   Back translate a protein sequence
   
Description

   backtranseq takes a protein sequence and makes a best estimate of the
   likely nucleic acid sequence it could have come from. It does this by
   using a codon frequency table. For each amino acid, the corresponding
   most frequently occuring codon is used in the construction of the
   nucleic acid sequence.
   
  Codon usage table name
  
   backtranseq reads in a data file containing the codon frequency
   tables. The default codon frequency table is 'Ehum.cut' - the human
   codon frequency table. It is important to use a codon frequency table
   that is appropriate for the species that your protein comes from. See
   the Data Files section below for more details on these files.
   
Usage

   Here is a sample session with backtranseq. Note that this is a human
   protein and so the default (human) codon frequency file is used (i.e.
   is not specified).
   
% backtranseq
Back translate a protein sequence
Input sequence: sw:opsd_human
Output sequence [opsd_human.fasta]:

   Here is a session using a drosophila sequence and codon table:
   
% backtranseq -cfile Edrosophila.cut
Back translate a protein sequence
Input sequence: sw:ach2_drome
Output sequence [ach2_drome.fasta]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
  [-outfile]           seqout     Output sequence USA

   Optional qualifiers:
   -cfile              codon      Codon usage table name

   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-outfile]
   (Parameter 2) Output sequence USA Writeable sequence <sequence>.format
   Optional qualifiers Allowed values Default
   -cfile Codon usage table name Codon usage file in EMBOSS data path
   Ehum.cut
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   Any DNA sequence USA.
   
Output file format

   The output is a nucleotide sequence containing the most favoured back
   translation of the specified protein, and using the specified
   translation table (which defaults to human).
   
   The output from the backtranslation of the human protein sw:opsd_human
   follows:
     _________________________________________________________________
   
% more opsd_human.fasta
>OPSD_HUMAN P08100 RHODOPSIN.
ATGAACGGCACCGAGGGCCCCAACTTCTACGTGCCCTTCAGCAACGCCACCGGCGTGGTG
AGAAGCCCCTTCGAGTACCCCCAGTACTACCTGGCCGAGCCCTGGCAGTTCAGCATGCTG
GCCGCCTACATGTTCCTGCTGATCGTGCTGGGCTTCCCCATCAACTTCCTGACCCTGTAC
GTGACCGTGCAGCACAAGAAGCTGAGAACCCCCCTGAACTACATCCTGCTGAACCTGGCC
GTGGCCGACCTGTTCATGGTGCTGGGCGGCTTCACCAGCACCCTGTACACCAGCCTGCAC
GGCTACTTCGTGTTCGGCCCCACCGGCTGCAACCTGGAGGGCTTCTTCGCCACCCTGGGC
GGCGAGATCGCCCTGTGGAGCCTGGTGGTGCTGGCCATCGAGAGATACGTGGTGGTGTGC
AAGCCCATGAGCAACTTCAGATTCGGCGAGAACCACGCCATCATGGGCGTGGCCTTCACC
TGGGTGATGGCCCTGGCCTGCGCCGCCCCCCCCCTGGCCGGCTGGAGCAGATACATCCCC
GAGGGCCTGCAGTGCAGCTGCGGCATCGACTACTACACCCTGAAGCCCGAGGTGAACAAC
GAGAGCTTCGTGATCTACATGTTCGTGGTGCACTTCACCATCCCCATGATCATCATCTTC
TTCTGCTACGGCCAGCTGGTGTTCACCGTGAAGGAGGCCGCCGCCCAGCAGCAGGAGAGC
GCCACCACCCAGAAGGCCGAGAAGGAGGTGACCAGAATGGTGATCATCATGGTGATCGCC
TTCCTGATCTGCTGGGTGCCCTACGCCAGCGTGGCCTTCTACATCTTCACCCACCAGGGC
AGCAACTTCGGCCCCATCTTCATGACCATCCCCGCCTTCTTCGCCAAGAGCGCCGCCATC
TACAACCCCGTGATCTACATCATGATGAACAAGCAGTTCAGAAACTGCATGCTGACCACC
ATCTGCTGCGGCAAGAACCCCCTGGGCGACGACGAGGCCAGCGCCACCGTGAGCAAGACC
GAGACCAGCCAGGTGGCCCCCGCC
     _________________________________________________________________
   
Data files

   The codon usage table is read by default from "Ehum.cut" in the
   'data/CODONS' directory of the EMBOSS distribution. If the name of a
   codon usage file is specified on the command line, then this file will
   first be searched for in the current directory and then in the
   'data/CODONS' directory of the EMBOSS distribution.
   
   To see the available EMBOSS codon usage files, run:

% embossdata -showall

   To fetch one of the codon usage tables (for example 'Emus.cut') into
   your current directory for you to inspect or modify, run:

% embossdata -fetch -file Emus.cut

Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   "Corrupt codon index file" - the codon usage file is incomplete or
   empty.
   
   "The file 'drosoph.cut' does not exist" - the codon usage file cannot
   be opened.
   
Exit status

   This program always exits with a status of 0, unless the codon usage
   table cannot be opened.
   
Known bugs

   None.
   
See also

   Program name Description
   charge Protein charge plot
   checktrans Reports STOP codons and ORF statistics of a protein
   sequence
   coderet Extract CDS, mRNA and translations from feature tables
   compseq Counts the composition of dimer/trimer/etc words in a sequence
   emowse Protein identification by mass spectrometry
   freak Residue/base frequency table or plot
   iep Calculates the isoelectric point of a protein
   mwfilter Filter noisy molwts from mass spec output
   octanol Displays protein hydropathy
   pepinfo Plots simple amino acid properties in parallel
   pepstats Protein statistics
   pepwindow Displays protein hydropathy
   pepwindowall Displays protein hydropathy of a set of sequences
   plotorf Plot potential open reading frames
   prettyseq Output sequence with translated ranges
   remap Display a sequence with restriction cut sites, translation etc
   showorf Pretty output of DNA translations
   showseq Display a sequence with features, translation etc
   transeq Translate nucleic acid sequences
   
Author(s)

   This application was written by Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
   
History

   Completed 6 Oct 1999
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
