Command Line MotifScanner

Overview

Required arguments
Optional arguments
Output definitions
Background model description
Motif model description

The MotifScanner is our algorithm to find pre-defined motifs in DNA sequences, based on a probabilistic sequence model. The model assumes that motif instances are hidden in noisy background sequence. To model the background we use higher-order background models.
Some basic remarks on the program:

The program should be started from the command line. A full description of the required and optional arguments can be found below.
The final results are printed either on STDOUT or in a file in GFF format.
The motif models are stored in a specific format (see below).

Background models are stored in a specific format (see below).

On the STDERR you can monitor the progress of the program.

Required Arguments

Switch Argument Description

-f file Input sequences in fasta format.

-m file Motif model description. Format description of the file can be found below.

-b file Background model description. Format description of the background model file can be found below.

Switch	Argument	Description
-f	file	Input sequences in fasta format.
-m	file	Motif model description. Format description of the file can be found below.
-b	file	Background model description. Format description of the background model file can be found below.

Optional Arguments

Switch Argument Description

-s 0|1 Choose single (0) or double stranded search (1). Default = 0.

-p value Set prior probability of finding one instance of the motif. This value allows the user to define the type of motif to search for. If the prior is set close to 0 then only the best conserved instances are retrieved, increasing the prior will allow more degeneracy in the found instances.

-l file Define a file with id's of motif models that should be selected from the given matrix file.
Such a list is useful when you only want to test a subset of the motif models in your matrix file, without the need to edit the matrix file.

Switch	Argument	Description
-s	0\|1	Choose single (0) or double stranded search (1). Default = 0.
-p	value	Set prior probability of finding one instance of the motif. This value allows the user to define the type of motif to search for. If the prior is set close to 0 then only the best conserved instances are retrieved, increasing the prior will allow more degeneracy in the found instances.
-l	file	Define a file with id's of motif models that should be selected from the given matrix file. Such a list is useful when you only want to test a subset of the motif models in your matrix file, without the need to edit the matrix file.

Output Description

Switch Argument Description

-o file Set the output file to save the results. The found motif instances are written to this file in GFF format. Default the results are written to STDOUT.

Switch	Argument	Description
-o	file	Set the output file to save the results. The found motif instances are written to this file in GFF format. Default the results are written to STDOUT.

Background Model Description

A pre-compiled background model is stored in an ascii text file using a well defined (and not very flexible) format. To create such a background model file from a set of input sequences you can use the tool found on our download page
Below you can find an example of the first-order Arabidopsis thaliana background model file. The file should always start with the word #INCLUSive at the first position of the file. Next, there are several lines describing the organism, data set and order of the background model. Finally the data itself are represented.

--
#INCLUSive Background Model v1.0
#
#Order = 1
#Organism = athaliana
#Sequences = /users/sista/thijs/scratch/DNA/intergenic.tfa
#Path = 
#

#snf
0.3449  0.1581  0.1556  0.3414  

#oligo frequency 
0.3449
0.1581
0.1556
0.3414

#transition matrix
0.3911  0.1516  0.1482  0.3091  
0.3760  0.1703  0.1268  0.3269  
0.3630  0.1389  0.1671  0.3311  
0.2756  0.1678  0.1711  0.3855  
--

Motif Model Description

A set of pre-compiled motif models should be stored in an ascii text file using a well defined (and not very flexible) format. Such a file can be the result of the MotifSampler.
Below you can find an example of the a motif model file with 2 motifs. The file should always start with the word #INCLUSive at the first position of the file. For each motif the lines staring with "#ID" and "#W" are obligatory, while the lines starting with "#Score" and "#Consensus" are optional. After the identifiers, the data matrix itself is represented. There should be always 4 columns separated by a tab or white spaces. The matrxi should count as many rows as stated by the value in the preceeding line starting with "#W".

--
#INCLUSive Motif Model v1.0
#
#ID = gbox
#Score = 274.657
#W = 8
#Consensus = nmCACGTG
0.1857  0.0559  0.5727  0.1857        
0.3518	0.5912  0.0559  0.0011      
0.0011  0.9973  0.0005  0.0011      
0.9979  0.0005  0.0005  0.0011      
0.0011  0.9973  0.0005  0.0011      
0.0011  0.0559  0.9419  0.0011      
0.0011  0.0005  0.0005  0.9979        
0.0011  0.0374  0.9604  0.0011      


#ID = sigma54
#Score = 91.5958
#W = 16
#Consensus = TGGCACrAnnnnTGCw
0.00575087      0.00400983      0.00393972      0.9863  
0.00575087      0.00400983      0.984465        0.00577424      
0.00575087      0.00400983      0.984465        0.00577424      
0.00575087      0.984535        0.00393972      0.00577424      
0.986276        0.00400983      0.00393972      0.00577424      
0.00575087      0.682835        0.00393972      0.307474        
0.458301        0.00400983      0.45649         0.0811993       
0.684576        0.30571         0.00393972      0.00577424      
0.232026        0.30571         0.0793647       0.382899        
0.382876        0.15486         0.15479         0.307474        
0.307451        0.30571         0.15479         0.232049        
0.156601        0.15486         0.15479         0.533749        
0.00575087      0.00400983      0.00393972      0.9863  
0.00575087      0.00400983      0.90904         0.0811993       
0.0811759       0.75826         0.00393972      0.156624        
0.609151        0.00400983      0.00393972      0.382899        

--