|
|
EMBOSS: siggen |
Each position in the alignment is scored on the basis of a single or any combination of up to 3 scoring schemes. A signature of, for example, 10% sparsity would include data from the top 10% highest scoring alignment positions.
The resulting protein signature file is used by the application sigscan to find examples of the signature in other proteins.
% siggen Generates a sparse protein signature Location of alignment files for input [./]: ./jontest Extension of alignment files for input [.align]: Location of contact files for input [./]: ./jontest Extension of contact files [.con]: % sparsity of signature [10]: Generate a randomized signature [N]: Substitution matrix to be used [./EBLOSUM62]: Score alignment on basis of residue conservation [Y]: Score alignment on basis of number of contacts [Y]: Score alignment on basis of conservation of contacts [Y]: N Score alignment on a combined measure of number and conservation of contacts [N]: Ignore alignment postitions with post_similar value of 0 [Y]: Name of signature file for output [sig.sig]:
Mandatory qualifiers (* if not always prompted):
[-algpath] string Location of alignment files for input
[-algextn] string Extension of alignment files for input
[-sparsity] integer % sparsity of signature
[-randomise] bool Generate a randomised signature
* -seqoption menu Select number
* -datafile matrixf Substitution matrix to be used
* -conoption menu Select number
* -filtercon bool Ignore alignment positions making less than
a threshold number of contacts
* -conthresh integer Threshold contact number
* -conpath string Location of contact files for input
* -conextn string Extension of contact files
* -cpdbpath string Location of coordinate files for input
(embl-like format)
* -cpdbextn string Extension of coordinate files (embl-like
format)
* -filterpsim bool Ignore alignment postitions with
post_similar value of 0
[-sigpath] string Location of signature files for output
[-sigextn] string Extension of signature files for output
Optional qualifiers: (none)
Advanced qualifiers: (none)
General qualifiers:
-help bool report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
|
| Mandatory qualifiers | Allowed values | Default | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| [-algpath] (Parameter 1) |
Location of alignment files for input | Any string is accepted | ./ | ||||||||
| [-algextn] (Parameter 2) |
Extension of alignment files for input | Any string is accepted | .align | ||||||||
| [-sparsity] (Parameter 3) |
% sparsity of signature | Any integer value | 10 | ||||||||
| [-randomise] (Parameter 4) |
Generate a randomised signature | Yes/No | No | ||||||||
| -seqoption | Select number |
|
3 | ||||||||
| -datafile | Substitution matrix to be used | Comparison matrix file in EMBOSS data path | ./EBLOSUM62 | ||||||||
| -conoption | Select number |
|
4 | ||||||||
| -filtercon | Ignore alignment positions making less than a threshold number of contacts | Yes/No | No | ||||||||
| -conthresh | Threshold contact number | Any integer value | 10 | ||||||||
| -conpath | Location of contact files for input | Any string is accepted | /data/contacts/ | ||||||||
| -conextn | Extension of contact files | Any string is accepted | .con | ||||||||
| -cpdbpath | Location of coordinate files for input (embl-like format) | Any string is accepted | /data/cpdbscop/ | ||||||||
| -cpdbextn | Extension of coordinate files (embl-like format) | Any string is accepted | .pxyz | ||||||||
| -filterpsim | Ignore alignment postitions with post_similar value of 0 | Yes/No | No | ||||||||
| [-sigpath] (Parameter 5) |
Location of signature files for output | Any string is accepted | ./ | ||||||||
| [-sigextn] (Parameter 6) |
Extension of signature files for output | Any string is accepted | .sig | ||||||||
| Optional qualifiers | Allowed values | Default | |||||||||
| (none) | |||||||||||
| Advanced qualifiers | Allowed values | Default | |||||||||
| (none) | |||||||||||
Example excerpt from an output signature file:
CL All beta proteins XX FO Lipocalins XX SF Lipocalins XX FA Fatty acid binding protein-like XX NP 2 XX NN [1] XX IN NRES 3 ; NGAP 2 ; WSIZ 2 XX AA A ; 2 AA V ; 1 AA L ; 4 XX GA 1 ; 5 GA 2 ; 2 XX NN [2] XX IN NRES 2 ; NGAP 2 ; WSIZ 5 XX AA F ; 1 AA Y ; 5 XX GA 12 ; 3 GA 10 ; 2 XX //
Important
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
| Program name | Description |
|---|---|
| contacts | Reads coordinate files and writes contact files |
| dichet | Parse dictionary of heterogen groups |
| interface | Reads coordinate files and writes inter-chain contact files |
| psiblasts | Runs PSI-BLAST given scopalign alignments |
| scopalign | Generate alignments for SCOP families |
| seqsort | Removes ambiguities from a set of hits resulting from a database search |
| sigscan | Scans a sparse protein signature against swissprot |