
                             EMBOSS: wordcount
     _________________________________________________________________
   
                               Program wordcount
                                       
Function

   Counts words of a specified size in a DNA sequence
   
Description

Displays all the words of the specified length with the number of
times it occurs.

Usage

   Here is a sample session with wordcount.
   
% wordcount embl:rnu68037 -wordsize=3
Counts words of a specified size in a DNA sequence
Output file [rnu68037.wordcount]:

Command line arguments

   Mandatory qualifiers:
  [-sequence]          sequence   Sequence USA
   -wordsize           integer    Word size
   -outfile            outfile    Output file name

   Optional qualifiers: (none)
   Advanced qualifiers: (none)
   General qualifiers:
  -help                bool       report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   

   Mandatory qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence USA Readable sequence Required
   -wordsize Word size Integer 2 or more 4
   -outfile Output file name Output file <sequence>.wordcount
   Optional qualifiers Allowed values Default
   (none)
   Advanced qualifiers Allowed values Default
   (none)
   
Input file format

   Any sequence USA.
   
Output file format

   The output sequence produced in the above example is:
     _________________________________________________________________
   
ctg     54
tgg     53
gcc     53
ggc     51
cgc     47
gct     47
gtg     40
tgc     39
cct     38
gcg     36
cca     29
ggg     26
cag     25
ctt     25
tcc     25
ggt     24
ccc     24
ctc     23
tgt     23
gca     22
cgt     22
ccg     22
cac     22
agc     21
acg     19
ttg     19
cgg     19
tcg     18
ttc     17
cat     17
agg     17
act     16
gtc     16
gag     16
aac     15
gga     14
atc     14
tct     14
tca     13
cta     13
atg     12
gtt     11
acc     11
gta     11
aca     10
tac     10
tga     10
caa     10
gac     9
agt     9
tag     9
ttt     8
cga     7
gat     6
taa     6
tat     5
aga     5
gaa     4
aat     3
ata     3
tta     3
att     3
aag     2
aaa     1
     _________________________________________________________________
   
   The file simply consists of two columns, separated by spaces or TAB
   characters.
   
   The first column consists of all the possible words of size wordsize.
   The second column consists of the count of those words in the input
   sequence.
   
Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   0 if successful.
   
Known bugs

   None.
   
See also

   Program name                          Description
   banana       Bending and curvature plot in B-DNA
   btwisted     Calculates the twisting in a B-DNA sequence
   chaos        Create a chaos game representation plot for a sequence
   compseq      Counts the composition of dimer/trimer/etc words in a sequence
   dan          Calculates DNA RNA/DNA melting temperature
   freak        Residue/base frequency table or plot
   isochore     Plots isochores in large DNA sequences
   
Author(s)

   This application was written by Ian Longden (il@sanger.ac.uk)
   Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus,
   Hinxton, Cambridge, CB10 1SA, UK.
   
History

   Completed 27th November 1998.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
