BioSCENTer - Context and Motivation
Since the beginning of this century, ICT, mathematics/statistics and biology have started merging synergetically into new emerging fields like systems biology, computational biology, synthetic biology, etc… There is a spectacular evolution in these sciences, driven by the following trends in technology and applications.
Biology and biomedicine are rapidly evolving from relatively 'data-poor', to 'data-rich'-sciences, due to new, emerging high-throughput technologies , leading to 'omics' (transcriptomics, proteomics, metabolomics,...).
As a particular example illustrating the exponential growth of technology, there is the equivalent of Moore's law in ICT ('The density of transistors per surface silicium doubles every 18 months), for the capacity of sequencing technology: This is called Carlson's law, which states that the number of sequenced and/or synthesized base pares per person per day doubles every 12 months. Both 'laws' are illustrated in the Figure. Also, the corresponding cost is decreasing at a fast rate, an evolution which will lead us to cheap ultra-high density arrays. The 1000-dollar genome is in sight !
1 To name a few important ones: microarrays, array CGH (Comparative Genomic Hybridization), SNP (Single Nucleotide Polymorphisms) arrays, MRI (Magnetic Resonance Imaging), MRS (Magnetic Resonance Spectroscopy), sequencing machines (454), proteomics, Fluorescent Correlation Spectroscopy, ...
There is an increasing number of dedicated (genetically modified) model-organisms (e.g. rat, mouse, yeast,…) for an increasing number of pathologies. There are dramatic breakthroughs such as Induced Pluripotent Stem (iPS) cells as a disease model (differentiated cells can relatively fast and easy be brought back to a stem-like state by engineering the overexpression of a small number of key factors). This opens perspectives for generating data from experiments with model organisms and patient-specific iPS cells as models for human disease and therapy.
There is a dramatic, exponential, evolution in the number and size of publically available databases containing (interactions between) sequence information, gene expression data, phenotypic data, proteins and protein-protein interaction, cross-species genomes, ontologies (structured domain specific vocabularies), text in articles (e.g. Medline), etc…. In the Figure to the right, we see the doubling of the numbers of base pairs in GenBank every 18 months. In addition, there is an increasing number of multi-center collaboration projects, biobanks of cells, tissue, blood and other liquids,
forensic data and population banks organized on a regional or national level, containing clinical, genealogical, genetic, metabolic, life style related data. This requires the integration of multiple types of heterogenous data is a complex and formidable, yet fruitful challenge. Moreover, subdisciplines come and go as illustrated in the Figure bottom right.
The development of ICT is driven by Moore’s law (doubling of memory and computing power every 18 monts) and by hardware trends (e.g. network broadband capacity: 100 times faster over the last 10 years). Of course, these drivers are very relevant in today’s systems biology research. Additionaly, there will be a development of (wireless) point-of-care diagnostics tools, home and personalized devices (e.g. Human++-program IMEC), e-health web informations systems.
Modern biological and biomedical research in is driven by the 4M-model: By first formulating hypotheses in the form of a mathematical-statistical-computational Models (network, biochemical,…), via experimental design such as molecular genetics, cell engineering or sampling patients, (Manipulate), data are Measured (e.g. via high throughput technoloqies, imaging, biodevices, markers) and then, via computations and web searches Mined, to (in-)validate the original hypotheses. This leads to a new model and the cycle is repeated.
Nowadays in biological/biomedical research, one tries to explain complex phenotypes as the outcome of local interactions between numerous biological components, the activity of which might be spread temporally and spatially across several layers of scale, from atoms over molecules to tissues and organisms, and from genomics, transcriptomics, and proteomics to metabolomics. Indeed, biological systems are inherently characterized by multiscale complexity, over several layers of complexity.
Therefore, biological modeling problems are far more complex and challenging than the ‘classical’ ones we learned to solve. The unravelement of biological systems will therefore necessitate thorough and true multi-disciplanary collaboration between biologists, statisticians, software and hardware engineers, physicians and pharmacologists as is depicted in the Figure below:
The number of applications for systems, computational and synthetic biology seems unlimited:
- Science: biological systems understanding
- Health: disease management, new and inexpensive diagnostica, microbial sensors and health monitors
- Pharma: in silico drugs discover, customized medicine, toxicogenomics, 'smart' drugs, pharmaceuticals,
- Food/health: nutrigenomics, nutraceuticals, functional foods, tissue engineering
- Safety: biofilms, infectious disease management (bird flu, malaria,..)
- Environment: microbial ecology, destroy toxic waste and soil pollutants, water quality control, biosensors
- Energy: biofuels
- Ecology: protected area conservation, integrated forest management
- Agro: novel crop design, sustainable agriculture, improved resistance, yield optimization, food supply chain management
- Habitat: metagenomics, biodiversity