Command Line MotifScanner
- Required arguments
- Optional arguments
- Output definitions
- Background model description
- Motif model description
The MotifScanner is our algorithm to find pre-defined motifs in DNA sequences, based on a probabilistic sequence model. The model assumes that motif instances are hidden in noisy background sequence. To model the background we use higher-order background models.
Some basic remarks on the program:
- The program should be started from the command line. A full description of the required and optional arguments can be found below.
- The final results are printed either on STDOUT or in a file in GFF format.
- The motif models are stored in a specific format (see below).
- Background models are stored in a specific format (see below).
- On the STDERR you can monitor the progress of the program.
Switch |
Argument |
Description |
-f |
file |
Input sequences in fasta format. |
-m |
file |
Motif model description. Format description of the file can be found below. |
-b |
file |
Background model description. Format description of the background model file can be found below. |
Switch |
Argument |
Description |
-s |
0|1 |
Choose single (0) or double stranded search (1). Default = 0. |
-p |
value |
Set prior probability of finding one instance of the motif. This value allows the user to define the type of motif to search for. If the prior is set close to 0 then only the best conserved instances are retrieved, increasing the prior will allow more degeneracy in the found instances. |
-l |
file |
Define a file with id's of motif models that should be selected from the given matrix file. Such a list is useful when you only want to test a subset of the motif models in your matrix file, without the need to edit the matrix file. |
Switch |
Argument |
Description |
-o |
file |
Set the output file to save the results. The found motif instances are written to this file in GFF format. Default the results are written to STDOUT. |
Background Model Description |
---|
A pre-compiled background model is stored in an ascii text file using a well defined (and not very flexible) format. To create such a background model file from a set of input sequences you can use the tool found on our download page
Below you can find an example of the first-order Arabidopsis thaliana background model file. The file should always start with the word #INCLUSive at the first position of the file. Next, there are several lines describing the organism, data set and order of the background model. Finally the data itself are represented.
--
#INCLUSive Background Model v1.0
#
#Order = 1
#Organism = athaliana
#Sequences = /users/sista/thijs/scratch/DNA/intergenic.tfa
#Path =
#
#snf
0.3449 0.1581 0.1556 0.3414
#oligo frequency
0.3449
0.1581
0.1556
0.3414
#transition matrix
0.3911 0.1516 0.1482 0.3091
0.3760 0.1703 0.1268 0.3269
0.3630 0.1389 0.1671 0.3311
0.2756 0.1678 0.1711 0.3855
--
A set of pre-compiled motif models should be stored in an ascii text file using a well defined (and not very flexible) format. Such a file can be the result of the MotifSampler.
Below you can find an example of the a motif model file with 2 motifs. The file should always start with the word #INCLUSive at the first position of the file. For each motif the lines staring with "#ID" and "#W" are obligatory, while the lines starting with "#Score" and "#Consensus" are optional. After the identifiers, the data matrix itself is represented. There should be always 4 columns separated by a tab or white spaces. The matrxi should count as many rows as stated by the value in the preceeding line starting with "#W".
--
#INCLUSive Motif Model v1.0
#
#ID = gbox
#Score = 274.657
#W = 8
#Consensus = nmCACGTG
0.1857 0.0559 0.5727 0.1857
0.3518 0.5912 0.0559 0.0011
0.0011 0.9973 0.0005 0.0011
0.9979 0.0005 0.0005 0.0011
0.0011 0.9973 0.0005 0.0011
0.0011 0.0559 0.9419 0.0011
0.0011 0.0005 0.0005 0.9979
0.0011 0.0374 0.9604 0.0011
#ID = sigma54
#Score = 91.5958
#W = 16
#Consensus = TGGCACrAnnnnTGCw
0.00575087 0.00400983 0.00393972 0.9863
0.00575087 0.00400983 0.984465 0.00577424
0.00575087 0.00400983 0.984465 0.00577424
0.00575087 0.984535 0.00393972 0.00577424
0.986276 0.00400983 0.00393972 0.00577424
0.00575087 0.682835 0.00393972 0.307474
0.458301 0.00400983 0.45649 0.0811993
0.684576 0.30571 0.00393972 0.00577424
0.232026 0.30571 0.0793647 0.382899
0.382876 0.15486 0.15479 0.307474
0.307451 0.30571 0.15479 0.232049
0.156601 0.15486 0.15479 0.533749
0.00575087 0.00400983 0.00393972 0.9863
0.00575087 0.00400983 0.90904 0.0811993
0.0811759 0.75826 0.00393972 0.156624
0.609151 0.00400983 0.00393972 0.382899
--
This page is maintained by Gert Thijs. Last update 2005/06/10.
Email: gert.thijs@esat.kuleuven.be
Copyright © 2002-2005 KULeuven.