Command Line MotifScanner

Overview

  1. Required arguments
  2. Optional arguments
  3. Output definitions
  4. Background model description
  5. Motif model description
The MotifScanner is our algorithm to find pre-defined motifs in DNA sequences, based on a probabilistic sequence model. The model assumes that motif instances are hidden in noisy background sequence. To model the background we use higher-order background models.
Some basic remarks on the program:

Required Arguments

Switch Argument Description
-f file Input sequences in fasta format.
-m file Motif model description. Format description of the file can be found below.
-b file Background model description. Format description of the background model file can be found below.


Optional Arguments

Switch Argument Description
-s 0|1 Choose single (0) or double stranded search (1). Default = 0.
-p value Set prior probability of finding one instance of the motif. This value allows the user to define the type of motif to search for. If the prior is set close to 0 then only the best conserved instances are retrieved, increasing the prior will allow more degeneracy in the found instances.
-l file Define a file with id's of motif models that should be selected from the given matrix file.
Such a list is useful when you only want to test a subset of the motif models in your matrix file, without the need to edit the matrix file.


Output Description

Switch Argument Description
-o file Set the output file to save the results. The found motif instances are written to this file in GFF format. Default the results are written to STDOUT.


Background Model Description

A pre-compiled background model is stored in an ascii text file using a well defined (and not very flexible) format. To create such a background model file from a set of input sequences you can use the tool found on our download page
Below you can find an example of the first-order Arabidopsis thaliana background model file. The file should always start with the word #INCLUSive at the first position of the file. Next, there are several lines describing the organism, data set and order of the background model. Finally the data itself are represented.

--
#INCLUSive Background Model v1.0
#
#Order = 1
#Organism = athaliana
#Sequences = /users/sista/thijs/scratch/DNA/intergenic.tfa
#Path = 
#

#snf
0.3449  0.1581  0.1556  0.3414  

#oligo frequency 
0.3449
0.1581
0.1556
0.3414

#transition matrix
0.3911  0.1516  0.1482  0.3091  
0.3760  0.1703  0.1268  0.3269  
0.3630  0.1389  0.1671  0.3311  
0.2756  0.1678  0.1711  0.3855  
--

Motif Model Description

A set of pre-compiled motif models should be stored in an ascii text file using a well defined (and not very flexible) format. Such a file can be the result of the MotifSampler.
Below you can find an example of the a motif model file with 2 motifs. The file should always start with the word #INCLUSive at the first position of the file. For each motif the lines staring with "#ID" and "#W" are obligatory, while the lines starting with "#Score" and "#Consensus" are optional. After the identifiers, the data matrix itself is represented. There should be always 4 columns separated by a tab or white spaces. The matrxi should count as many rows as stated by the value in the preceeding line starting with "#W".

--
#INCLUSive Motif Model v1.0
#
#ID = gbox
#Score = 274.657
#W = 8
#Consensus = nmCACGTG
0.1857  0.0559  0.5727  0.1857        
0.3518	0.5912  0.0559  0.0011      
0.0011  0.9973  0.0005  0.0011      
0.9979  0.0005  0.0005  0.0011      
0.0011  0.9973  0.0005  0.0011      
0.0011  0.0559  0.9419  0.0011      
0.0011  0.0005  0.0005  0.9979        
0.0011  0.0374  0.9604  0.0011      


#ID = sigma54
#Score = 91.5958
#W = 16
#Consensus = TGGCACrAnnnnTGCw
0.00575087      0.00400983      0.00393972      0.9863  
0.00575087      0.00400983      0.984465        0.00577424      
0.00575087      0.00400983      0.984465        0.00577424      
0.00575087      0.984535        0.00393972      0.00577424      
0.986276        0.00400983      0.00393972      0.00577424      
0.00575087      0.682835        0.00393972      0.307474        
0.458301        0.00400983      0.45649         0.0811993       
0.684576        0.30571         0.00393972      0.00577424      
0.232026        0.30571         0.0793647       0.382899        
0.382876        0.15486         0.15479         0.307474        
0.307451        0.30571         0.15479         0.232049        
0.156601        0.15486         0.15479         0.533749        
0.00575087      0.00400983      0.00393972      0.9863  
0.00575087      0.00400983      0.90904         0.0811993       
0.0811759       0.75826         0.00393972      0.156624        
0.609151        0.00400983      0.00393972      0.382899        

--


This page is maintained by Gert Thijs. Last update 2005/06/10.
Email: gert.thijs@esat.kuleuven.be
Copyright © 2002-2005 KULeuven.