|
Data Fusion and Integrated Learning in MicroArray Data Analyis |
|
|
|
The dissemination of microarray (MA) technology, providing the possibility of simultaneous measurement of the activity of a large numbers of genes, gave new stimuli towards a broader understanding of their functional roles and their complex interactive networking. The complexity of these fundamental biological questions though, demand systems that allow an open-ended organization of human expertise, the measured statistical data and the electronic domain literature. The availability of large amount of electronic domain literature has changed the traditional way of data analysis essentially. Nowadays the significant amount of domain knowledge is available in either unstructured or structured electronic format, e.g. in electronic natural language documents, structured documents, databases, knowledge bases respectively. Many efforts in the bioinformatics community are currently directed towards the development of machine-friendly representations, standardization of the state-of-the-art biological information resources and the establishment of cross-linked or combined information sources. Despite these efforts much of the domain-specific information remains present in natural language unstructured text or scattered around different repositories. To date, the interpretation and validation of gene expression clusters remains a time-consuming manual task based on connecting the text-encoded domain knowledge to data and hypothesizing about the relationships hidden within the clusters based on the data. While it is feasible for a small set of genes, it becomes more cumbersome for larger or even genome-wide sets. Consequently, the expert's environment can be characterized as split up in the data world, containing the high-throughput data and statistical analysis methods, and the knowledge world containing the domain knowledge dominantly present in free-text form. The knowledge discovery process can hence be characterized as an interaction of the expert between the data and the knowledge/text world. To increase the efficiency of this interaction, intelligent systems should soften the current separation of the two worlds (i.e. separation between tools for data analysis and information retrieval) by following an integrated approach to this biological challenge. Possible steps in removing the barriers between the data and text oriented worlds can be the following:
Our research focuses on the third step : we use common statistical information retrieval methods (tf-idf weighting scheme, cosine relevance measures,..) to represent text-based knowledge and look how to combine it with data-inferred correlations. See technical reports 01-69 and 02-51 for more details.
|
|
|
Copyright © 1998 Katholieke Universiteit Leuven Design: Gert Thijs Last update: 2001/03/13 |
|