Clustering

Introduction

You can use this webinterface to cluster your gene expression data. This interface is intended to find groups of genes that have a similar expression profile. To find these groups we use Adaptive Quality-Based Clustering, an algorithm designed by Frank De Smet. To enter your data you need to save them as a tab-delimited ascii text file. The correct format is described below.
If you have any comments or questions concerning this web site, please feel free to contact Gert Thijs. For specific questions about the clustering algorithm you can contact Frank De Smet.

If you like our software, you can always cite:
Frank De Smet, Janick Mathys, Kathleen Marchal, Gert Thijs, Bart De Moor and Yves Moreau. 2002. Adaptive Quality-based clustering of gene expression profiles, 2002. Bioinformatics, 18(6), 735-746.
Additional information accompanying this paper can be found here.

Data Format

Before you start using this clustering web server please check the required format of the data file.

Your data file should be a tab-delimited ascii text file.
All fields should be tab separated.
All lines starting with a '#' are discarded. If there are lines in your file that do not contain measurements and that do not start with a '#', these line will certainly corrupt the input.
To obtain the best results you might log-transform your data. It is not necesarry to normalize the data, this is done within the core of the algorithm.
You have two options to identify the genes in your data file.

First column: Primary identifier of the gene of interest: accession number, ...
Optionally, second column: eg. gene name

All the other columns contain the expression levels as numerical values. If there are some missing values in your data you can leave them blank or substitute them by NaN. This is to indicate it is 'not a number'.
If you have any questions about the data format take a look at the example or feel free to contact us.

Example

Here you can find an example of the data file with the first two columns as gene identifiers. An example of the results page showing all the clusters found in this data set with the parameters, MIN_NR_GENES = 3 and S = 0.95, can be found through this link. You can of course download the data and try the clustering software yourself.
The expression data used in this example is originally generated by "Reymond et al. (2000) Differential Gene Expression in Response to Mechanical Wounding and Insect Feeding in Arabidopsis. Plant Cell 12:707-720"

Identification

Email Address
The results will be sent to this address, so please make sure it is correct.

Identifier
Give an alphanumeric name to identify your expression data. (Only letters, numbers and underscores can be used, do not use spaces, dots, colons,... in this name).

Expression Data

Expression Data
You can use the file upload field to enter the file in which the expression data are stored. Please check if your input data are conform the described format.

Number of columns in the data set that contain gene identifiers:

Select the type of organism:

Parameters

Minimal number of genes in a cluster (MIN_NR_GENES):

Minimal probability of gene belonging to cluster (S, between 0.5 and 1):

Minimal number of genes in a cluster (MIN_NR_GENES):
Minimal probability of gene belonging to cluster (S, between 0.5 and 1):