




|
Mining microarray data for oncology
People: Frank De Smet, Yves Moreau
Approximately 40% of the population will sooner or later be faced with cancer. For about half of them (55-60%) the disease will be fatal. Unmistakably, cancer constitutes a medical problem of major importance. For cancer, the process of diagnosis, staging, prognosis estimation and therapy selection is usually based on empirical data, present in the literature (derived from clinical studies) and in some cases even on the personal experience of the medical doctor. The fundamental mechanisms underlying carcinogenesis are in many cases still elusive, although it is assumed that most cancers originate from genetic disorders. A more profound insight in these mechanisms will unmistakably be of major importance in making the right predictions and decisions.
The phenotype of the tumor is determined by the collection of disturbed gene expression levels. Microarrays allow us to measure the expression levels of thousands of genes simultaneously. These microarray experiments are repeated for different samples, under different conditions (e.g., tumor cells before and after metastasis, tumor cells originating from different tumor types, etc...). Comparing expression levels of genes/samples (by the use of data mining methods) allows us to perform:
- Class discovery (e.g., to determine the different tumor types or to group genes with similar behaviour)
- Class prediction (e.g., to predict diagnosis and staging)
- Feature selection (e.g. to select the relevant genes).
We applied a variety of algorithms (PCA, LDA, K-means, ...) to serveral data sets consisting of microarray measurements:
- Microarray analysis of 72 peripheral blood and bone marrow samples of leukemia patients (acute lymphoblastic leukemia and acute myeloid leukemia): Clustering and class prediction yielded nearly perfect results here.
- Microarray analysis of 57 human breast tumor samples of grade 2 and 3: prediction of tumor grade: cross-validation yielded 70-80% correct classification = definitive proof that microarray data contains information about clinical behaviour (grade = +/- degree of malignancy, strongly correlated with prognosis) of tumors.
s.
|