Introduction

The Motif Sampler tries to find over-represented motifs in the upstream region of a set of co-regulated genes. This motif finding algorithm uses Gibbs sampling to find the position probability matrix that represents the motif. In this implementation we focus on the use of higher-order background models to improve the robustness of the motif finding. At the moment the Motif Sampler comes with background models for several organisms (see pop up list further down the page). But the Motif Sampler is also suitable for other organisms since the background model can also be calculated from the input sequences.
This is the web interface to the latest version of our Motif Sampler algorithm. Newly added features are the possibility to include both strands in the analysis, more precompiled background models and some internal optimizations of the code to speed up the computations. This web interface and the motif finding algorithm are the topic of research and are therefore always under further development and improvement, but this version is ready to search for common motifs in a set of coregulated DNA sequences. Since we are still working on the interface and the data processing, the server may be down once in a while.

NEW  Since Aug. 1 2002, you can also download the command line version of the MotifSampler.

Mailing List If you like to be informed on updates and new versions of our programs, you should join our INCLUSive mailing list.

If you like our software, you can show your appreciation by citing one of our publications:


For questions about this web site or if you encounter any problems, do not hesitate to contact Gert Thijs. We are open to any suggestions or remarks concerning this web interface.

Identification + Sequences

Email Address
The results will be sent to this address, so please make sure it is correct.


Identifier
Give an alphanumeric name to identify your sequences. (Only letters, numbers and underscores can be used, do not use spaces, dots, colons,... in this name).


DNA Sequences
DNA Sequences should be submitted in Fasta format (example). There should be at least 2 sequences in the data set. There is an upper limit set by your browser on the total size of the sequences you can paste in the text field. When using the text upload field the number of sequences is limited to 150 for computational reason. Submitting very large data sets will slow down computation significantly.
Please do not include any non-[ACGT] symbols. Sequences containing such symbols will be excluded from the data set.

Paste your DNA sequences here in Fasta format:


Or you can upload a file with all the sequences in Fasta format here:


Check this box if you like to include both strands in the analysis (default is checked).

Background Model

Here you can choose an appropriate background model. We have precompiled an independent background model for several organisms based upon data sets of upstream sequences that do not overlap with preceding genes. You can use one of these models, otherwise the background model will be computed based on the input sequences.
You also have to set an order of the background model. The number defines the order of the Markov process that describes the background. 0 means background model based on single nucleotide frequencies. If you do not use the precompiled models keep in mind that the order is limited by the number of nucleotides in your data set. You better restrict the order of the background model to 0 or 1 if you use the input sequences. The maximal order is limited to 4 (higher values will certainly give bad results).

 Follow this link for more information on the prokaryotic background models.

Choose from:       


Parameters

Motif length (e.g. between 5 and 15):  
Maximum number of copies per sequence (e.g between 1 and 5):  
Number of different motifs (limited to 15):  
Maximum allowed overlap between different motifs:  




This page is developed and maintained by Gert Thijs.
Email: gert.thijs@esat.kuleuven.ac.be
Copyright © 2002, K.U.Leuven.
Last Update: 2002/07/12.