You are here: Home > About Us > People > Marc Claesen

Marc Claesen

Marc Claesen

Research

Datamining algorithms for patient-driven e-health applications

The goal of this PhD project is to apply existing data-mining techniques on the data available to Belgian health care providers (``mutualities''). The health care providers dispose of all information provided by their customers to get medical refunds, this does not include diagnostic information.

The reason for testing the effectiveness of data-mining on this enormous datasource is to be able to construct a foundation upon which new e-Health applications can be built. In the first step, existing data-mining techniques such as clustering will be applied and later on, if necessary, those techniques will be customized to work on the data at hand.

The e-Health applications we wish to construct provide decision support for the customers. This general concept aims to provide the patient with precise and correct information regarding their illness, possible treatments etc. in order for the patient to make informed decisions. Studies have shown that this leads to higher patient satisfaction and willingness to participate actively in treatment and sometimes it may even lead to better outcomes.

e-Health applications

Infering caretaker quality

In their customer records, health care providers can infer a great deal of information about various caretakers. Statistics can be made of the prescription behaviour but one can also create comparison charts for different caretakers and care centers. Prescriptions can be used as a proxy to discover costumer illnesses. By clustering together costumers with a certain illness, a comparison can be made between caretakers treating those costumers. Important information in this aspect is the amount of patients treated (which is an indication to expertise), but also the average recovery time, costs, and many more.

Future perspectives for patients of chronic illnesses

Often patients suffering from chronic diseases wish to get an idea of what the future has in store for them. Based on information the health care providers have from other customers with the same illness, they can imitate such a crystal globe. Important aspects of this application include (1) identifying equivalent illnesses based on the information at hand, (2) identifying different stadia of the illness at hand and (3) summarising possible evolutions from the stadium a specific patient is in.

Techniques and algorithms

Clustering

The most frequently used technique in this PhD will be clustering and, if possible, supervised classification. Clustering aims to discover natural groups in a dataset (clusters). Many clustering algorithms need to know the amount of clusters before starting. This is the first challenge. Secondly, we have to test different clustering algorithms and similarity metrics to see which of them yield the best results and to get an insight in whether customization can proove to be useful.

Survival analysis

This statistical technique attempts to model the occurance of events in time. Survival analysis is used in biology to model life expectancies, but can also be used in our context to model going from one stadium of an illness to another. Survival analysis revolves around constructing a suitable hazard function, which we will base on the clusters of disease stadia.

Recommender systems

These are used to filter information based on user preferences. The goal is to present users with information that is of interest to them. Preferences in our context refer to specific treatments, logistics, costs, ... Secondly, it is important to note that preferences also refer to the way information is presented to the user. There are many ways to do this, and the correct way is very subjective. Some possible criteria to decide on how to present information are user age, literacy, language and so on.

 

PhD thesis: Machine Learning on Belgian Health Expenditure Data: Data-driven Screening for Type 2 Diabetes

PhD abstract:

Diabetes mellitus is a metabolic disorder characterized by elevated blood sugar levels, which may cause serious harm to many of the body's systems. The disease and its complications can be managed effectively when detected early, though this proves difficult as the time between onset and clinical diagnosis may span several years. Furthermore, estimates indicate that over one third of diabetes patients in developed countries are undiagnosed.

We investigated the potential of Belgian health expenditure data as a basis to build a cost-effective population-wide screening approach for (type 2) diabetes mellitus, aspiring to improve secondary prevention by speeding up the diagnosis of patients in order to initiate treatment before the disease has caused irrevocable damage. We used health expenditure data collected by the National Alliance of Christian Mutualities - the largest social health insurer in Belgium. This data comprises basic biographic information and records of all refunded medical interventions and drug purchases, thus providing a long-term longitudinal overview of over 4 million individuals' medical expenditure histories.

We have investigated the survival of diabetes patients since the start of a certain pharmacological therapy and found large differences in patients on different therapies. Then, we used advanced machine learning techniques to build predictive models to identify diabetes patients based on health insurance records. We had to overcome several challenges to make this application possible. We particularly investigated methods to build and evaluate predictive models without having a list of patients of which we are certain they do not have diabetes. Implementations of all our contributions have been made available in open-source software libraries to help other researchers.

The screening method we developed yields competitive performance to existing state-of-the-art approaches. This exceeded our expectations, since health expenditure data omits most info about the typical risk factors used by other screening methods (BMI, lifestyle, genetic predisposition, ...). As such, the combination of health expenditure data and additional information about risk factors is a promising avenue for future research in screening for diabetes mellitus. Finally, our approach has a very low operational cost since we only used readily-available data, which effectively removes one of the key barriers of population-wide screening for diabetes.

Jury

Prof. dr. ir. Bart De Moor (promotor)
De heer Frank De Smet (co-promotor)
Prof. dr. ir. Paul Sas (voorzitter/chairman)
Prof. dr. ir. Johan Suykens (secretaris/secretary)
Prof. dr. ir. Hendrik Blockeel
Prof. dr. Chantal Mathieu
Prof. dr. ir. Jesse Davis
Prof. dr. Marco Loog , Tu Delft

 

Finished projects

Publications

301 Moved Permanently

301 Moved Permanently

Contact information