INCLUSive - MotifSampler

Introduction

MotifSampler tries to find over-represented motifs in the upstream region of a set of co-regulated genes. This motif finding algorithm uses Gibbs sampling to find the position probability matrix that represents the motif. In this implementation we focus on the use of higher-order background models to improve the robustness of the motif finding. At the moment the MotifSampler comes with background models for several organisms (see pop up list further down the page).
This web interface is merely intended to hint at the capabilities of MotifSampler. MotifSampler is a stochastic method and can/will give different results when run with the same parameters several times. This might seem as a disadvantage but it becomes a real advantage when using the tool in a constructive manner (doing multiple runs and post-processing the results). To do this type of analysis you should not use the website, but you rather use the stand-alone version. Please also check the help pages for more information.

Mailing List
If you like to be informed on updates and new versions of our programs, you should join our INCLUSive mailing list.

If you like our software, you can show your appreciation by citing one of our publications:

Thijs G., Lescot M., Marchal K., Rombauts S., De Moor B., Rouzé P., Moreau Y., 2001. A higher order background model improves the detection of regulatory elements by Gibbs Sampling, Bioinformatics, 17(12),1113-1122.
Thijs G., Marchal K., Lescot M., Rombauts S., De Moor B., Rouzé P., Moreau Y., 2002. A Gibbs Sampling method to detect over-represented motifs in upstream regions of coexpressed genes, Journal of Computational Biology (special issue Recomb'2001), 9(2), 447-464.

For questions about this web site or if you encounter any problems, do not hesitate to contact Gert Thijs. We are open to any suggestions or remarks concerning this web interface.

Identification + Sequences

Email Address
The results will be sent to this address, so please make sure it is correct.

Identifier
Give an alphanumeric name to identify your sequences. (Only letters, numbers and underscores can be used, do not use spaces, dots, colons,... in this name).

DNA Sequences
DNA Sequences should be submitted in Fasta format (example). There should be at least 2 sequences in the data set. There is an upper limit set by your browser on the total size of the sequences you can paste in the text field. When using the text upload field the number of sequences is limited to 150 for computational reason. Submitting very large data sets will slow down computation significantly.
Please do not include any non-[ACGT] symbols. Sequences containing such symbols will be excluded from the data set.

Paste your DNA sequences here in Fasta format:

Or you can upload a file with all the sequences in Fasta format here:

Check this box if you like to include both strands in the analysis (default is checked).

Background Model

Here you can choose an appropriate background model. We have precompiled an independent background model for several organisms based upon data sets of upstream sequences that do not overlap with preceding genes. You can use one of these models, otherwise the background model will be computed based on the input sequences.
You also have to set an order of the background model. The number defines the order of the Markov process that describes the background. 0 means background model based on single nucleotide frequencies. If you do not use the precompiled models keep in mind that the order is limited by the number of nucleotides in your data set. You better restrict the order of the background model to 0 or 1 if you use the input sequences. The maximal order is limited to 4 (higher values will certainly give bad results).

Follow this link for more information on the prokaryotic background models.

Choose from:

Basic Parameters

Motif length
Prior probability of finding 1 motif instance (between 0 and 1):
Number of different motifs (limited to 6, more runs is better than more motifs):

Specific Parameters

Normally you do not need to adjust these more specific parameters. You should only change the maximum number of motif instances per sequences when you like to limit the number of instances per sequences to a maximum of 1 or 2. .

Maximum number of motif instances per sequence:
(0 means no limit set)
Maximum allowed overlap between different motifs: