We point out the existence of a disturbing deficiency in the field of system identification, namely the fact that many results, published in papers, are not reproducible. In many cases, datasets and time series, that are used to illustrate identification methods and algorithms in these publications, are not freely available. We propose to remedy this serious deficiency by setting up a publically accessible website, called DAISY, to which authors can submit datasets that are used to illustrate certain claims and algorithms in their papers. Several additional benefits are discussed as well.
Keywords: System identification, signal processing, time series analysis, data analysis, modeling, datasets.
Announcement: DAISY: A Database for Identification of Systems http://www.esat.kuleuven.be/sista/daisy/ Description: DAISY is an Internet application, mainly consisting of a database of datasets used in system identification or time series analysis. The system can be used in two directions: you can download datasets from the database (e.g. to use them to compare or test identification algorithms), and you can upload datasets to the database (to make it possible for other people to use your datasets to verify their algorithms, or to reproduce or enhance your results). The datasets in the database are subject to a mild review, so that we can guarantee a certain level of quality. Benefits: DAISY is an answer to a real challenge in research in system identification and signal processing, namely to ensure the reproducibility of results, based on real data. Often datasets are used to illustrate algorithms in publications, but almost never these datasets are public, so nobody is able to verify the results stated. Using DAISY, this problem is history: if you need to verify your algorithm with a real-world example, you can use datasets from DAISY, or you can submit the dataset you used to DAISY. This way everybody can (try to) reproduce your results. Other benefits of DAISY include increased collaboration between researchers, the gradual evolution of certain datasets into benchmarks, and the publication of comparisons between different methods or algorithms. Organisation: DAISY is a website consisting of a page with the datasets (sorted by category), a page where you can submit datasets, and pages with relevant links, all publications and talks about DAISY, a bibliography and software overview, some hitting statistics, and last but not least the acknowledgments to our sponsors. DAISY is being developed and maintained at the department of Electrical Engineering of the K.U.Leuven, in the research group SISTA, under the responsibility of Bart De Moor.
Reproducibility is one of the most basic characteristics of modern scientific research. Yet, it so turns out that in our own beloved discipline of System Identification this very aspect is often neglected or ignored. How many times have you seen papers at conferences, in journals, or while doing reviews, in which algorithms are tested on datasets or time series, that are not publically available ? How many times have you tried to achieve similar (i.e. reproduce) results on your own datasets, maybe with the same method, or maybe with your own, but for reasons unknown up to this date, failed to do so ? How many times have you tried to obtain datasets from an author, who seems to have disappeared from the earth after having published his work ?
This is all over now! We should do something about reproducibility in system identification, and as a matter of fact, we can. This is what this talk is about.
DAISY is an interactive DAtabase for Identification of SYstems, which can be found on the World Wide Web at the URL:
In this talk, we will argue how DAISY will contribute to achieve reproducibility of research results in system identification and modelling, how some of its datasets may evolve in time into real benchmarks, how DAISY will stimulate collaboration between researchers active in system identification and how it (she?) will be instrumental in establishing comparisons of concepts, methods and algorithms.
DAISY's central objects are datasets, which, once submitted, undergo a moderate review procedure and, when accepted, are publically available on the Internet. We also allow for programs (e.g. in Matlab) that generate data. Whether you work on prediction error methods, subspace identification, structured total least squares, maximum likelihood, time or frequency domain or adaptive identification, nonlinear, neural or fuzzy system identification, etc., DAISY will provide you with the right dataset to illustrate your method. Another approach lies in the fact that datasets are grouped according to their application domain (e.g. process industry, biomedical, econometric, etc.). One can therefore assess the potential succes of one's preferred method over all these different datasets and disciplines !
We will describe the organisation of DAISY, its usage and submission procedures and give some examples. DAISY also contains an extensive bibliography of books on system identification and provides hyperlinks to existing commercial or non-commercial software packages as well as hyperlinks to similar sites or to professional people and organisations that do system identification for a living.
If technology permits, we'll give a live demonstration.