LOGO
  Data Fusion and Integrated Learning in MicroArray Data Analyis
SISTA

Introduction
People
Research
Projects
Software
Collaborations Publications Web Links

The dissemination of microarray (MA) technology, providing the possibility of simultaneous measurement of the activity of a large numbers of genes, gave new stimuli towards a broader understanding of their functional roles and their complex interactive networking. The complexity of these fundamental biological questions though, demand systems that allow an open-ended organization of human expertise, the measured statistical data and the electronic domain literature.

The availability of large amount of electronic domain literature has changed the traditional way of data analysis essentially. Nowadays the significant amount of domain knowledge is available in either unstructured or structured electronic format, e.g. in electronic natural language documents, structured documents, databases, knowledge bases respectively. Many efforts in the bioinformatics community are currently directed towards the development of machine-friendly representations, standardization of the state-of-the-art biological information resources and the establishment of cross-linked or combined information sources. Despite these efforts much of the domain-specific information remains present in natural language unstructured text or scattered around different repositories.

To date, the interpretation and validation of gene expression clusters remains a time-consuming manual task based on connecting the text-encoded domain knowledge to data and hypothesizing about the relationships hidden within the clusters based on the data. While it is feasible for a small set of genes, it becomes more cumbersome for larger or even genome-wide sets. Consequently, the expert's environment can be characterized as split up in the data world, containing the high-throughput data and statistical analysis methods, and the knowledge world containing the domain knowledge dominantly present in free-text form. The knowledge discovery process can hence be characterized as an interaction of the expert between the data and the knowledge/text world.

To increase the efficiency of this interaction, intelligent systems should soften the current separation of the two worlds (i.e. separation between tools for data analysis and information retrieval) by following an integrated approach to this biological challenge. Possible steps in removing the barriers between the data and text oriented worlds can be the following:

  1. Intelligent support. More customized text-based methods based on the data analysis (e.g. information retrieval methods using contextual information from the data analysis)
  2. Semi-automatic support. Semi-automatic text based methods for the interpretation of the results of data analysis (e.g. automatic derivation of a textual profile for a gene cluster coming from the data analysis)
  3. Integrated approach. Methods for the integration of data and domain literature (e.g. comparison of clusterings of genes based on the measured data and on the corresponding literature)

Our research focuses on the third step : we use common statistical information retrieval methods (tf-idf weighting scheme, cosine relevance measures,..) to represent text-based knowledge and look how to combine it with data-inferred correlations. See technical reports 01-69 and 02-51 for more details.

01-69
Antal P., Glenisson P., Fannes G., Boonefaes T., Meszaros T., Rottiers P., Grooten J., De Moor B., Moreau Y., ``Towards an integrated usage of expression data and domain literature in gene clustering : representations and methods'', Internal Report 01-69, ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2001.
02-51
Antal P., Glenisson P., Fannes G., Mathijs J., De Moor B., Moreau Y., ``On the potential of domain literature for clustering and Bayesian network learning'', Internal Report 02-51, ESAT-SISTA, K.U.Leuven (Leuven,Belgium), 2002. Accepted for publication in ACM - SIGKDD, Edmonton, Alberta, Canada, 2002.
K.U.Leuven - CWIS
Copyright © 1998 Katholieke Universiteit Leuven
Design: Gert Thijs
Last update: 2001/03/13