




|
People: Frank De Smet, Patrick Glenisson, Kathleen Marchal, Janick Mathys, Yves Moreau, Gert Thijs
Adaptive quality-based clustering of gene expression profiles (Web Interface)
Clustering genes based on their expression behaviour/profiles (measured by microarrays) is an important step preceding further analysis of the interaction between these genes. Based on the hypothesis that similarity in expression (coexpression) implies similarity in regulatory mechanisms (coregulation), cluster algorithms only have to group genes with a significant degree of coexpression. Other genes have to be excluded from further analysis.
With these remarks in mind we designed an iterative two-step algorithm: First, we try to find an area in the data where the 'density' of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster - quality based approach). In a second step, we derive the true radius (or quality) of the cluster. This is achieved by fitting a model to the data using an EM-algorithm. The model used assumes that the data is normalized (this should always be the case when using gene expression profiles). By inferring the radius or quality from the data itself, the biologist is released from estimating this parameter manually (this parameter was sometimes hard to predict).
The most important properties of this approach are:
- The number of user-defined parameters is minimal.
- Not all genes are assigned to a cluster.
- The algorithm is relatively fast (definitive speed comparison has still to be performed)
We tested this algorithm successfully on several data sets present in the literature and on artificially created data.
Back to top
|