LOGO
  Adaptive quality-based clustering
SISTA

Introduction
People
Research
Projects
Software
Collaborations Publications Web Links

Clustering gene expression profiles

  People: Frank De Smet, Patrick Glenisson, Kathleen Marchal, Janick Mathys, Yves Moreau, Gert Thijs

Adaptive quality-based clustering of gene expression profiles (Web Interface)
Clustering genes based on their expression behaviour/profiles (measured by microarrays) is an important step preceding further analysis of the interaction between these genes. Based on the hypothesis that similarity in expression (coexpression) implies similarity in regulatory mechanisms (coregulation), cluster algorithms only have to group genes with a significant degree of coexpression. Other genes have to be excluded from further analysis.
With these remarks in mind we designed an iterative two-step algorithm: First, we try to find an area in the data where the 'density' of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster - quality based approach). In a second step, we derive the true radius (or quality) of the cluster. This is achieved by fitting a model to the data using an EM-algorithm. The model used assumes that the data is normalized (this should always be the case when using gene expression profiles). By inferring the radius or quality from the data itself, the biologist is released from estimating this parameter manually (this parameter was sometimes hard to predict).
The most important properties of this approach are:

  1. The number of user-defined parameters is minimal.
  2. Not all genes are assigned to a cluster.
  3. The algorithm is relatively fast (definitive speed comparison has still to be performed)
We tested this algorithm successfully on several data sets present in the literature and on artificially created data.
Cluster
Back to top

K.U.Leuven - CWIS
Copyright © 1998 Katholieke Universiteit Leuven
Design: Gert Thijs
Last update: 2001/03/13