About the project:

The project Cognitive Vision Systems is part of the EU funded Information Society Technologies Programme (IST) and started May 1st, 2001. Partners within the project are besides the K.U. Leuven (Belgium), the University of Karlsruhe and Freiburg (Germany), University of Oxford (United Kingdom), INRIA (France) and ETH Zürich (Switzerland).
The main goal of this 3 year running project is to build a vision system that can be used in a wider variety of fields and that is re-usable by introducing self-adaptation at the level of perception, by providing categorisation capabilities, and by making explicit the knowledge base at the level of reasoning, and thereby enabling the knowledge base to be changed.

Objectives:

Over the last three decades there has been significant progress in computer vision: it can now be used successfully for specific problems over many application areas. However, limited progress has been achieved on cognitive tasks such as `general' recognition (i.e. categorization) and scene understanding. Even given its successes, the remarkable abilities of human visual perception are currently still far beyond those of machine vision. These abilities are based in part on generic processes for image analysis, including powerful segmentation and figure-ground processes, colour constancy, the ability to estimate the 3D shape of objects using a variety of cues, etc. Human visual perception is equally remarkable for the largely unconscious mobilisation of knowledge about what the image means, or connotes. It has become clear that the processes of image analysis, memory (knowledge), and reasoning are closely inter-woven in the process that we call perception. It is not at all understood how this inter-weaving is effected, either by the human visual system, or in artificial vision systems. This is despite significant progress in Artificial Intelligence, in particular in the area of knowledge representation. We do know, however, that simplistic - essentially serial - process organisations such as "bottom-up" or "top down" cannot explain the rapid, reliable, opportunistic nature of human visual perception. A Cognitive Vision System aims to replicate in artificial vision systems the human-like interweaving of image processing, reasoning, and memory/knowledge. Aside from the intrinsically important contribution to understanding human vision this might result in, there are also important practical reasons for building cognitive vision systems. Image analysis algorithms are increasingly required to work in applications (for example, traffic control, security, and medical diagnosis) with high degrees of variability in scene content. Also increasingly, they are a tool in the hands of an expert who does not know much about image processing. A cognitive vision system that is able to incorporate, mobilise, and develop a knowledge base relevant to the application is necessary to realise the performance goals and to achieve the level of trust necessary for routine use.

Of course, many successful artificial vision systems have been built and are used routinely and successfully in a range of applications. However, to the extent that such systems embody knowledge and reasoning, it is "wired-in" or "compiled", in the sense that it is built into the code of the program, or encoded into a data structure. With such dedicated implementations, it is hard to adapt (or generalise) the same basic image processing components to work in another domain, where different knowledge is required. This then is the challenge addressed in this proposal:

To build a vision system that is re-usable by introducing self-adaptation at the level of perception and by making explicit the knowledge base at the level of reasoning, and thereby enabling the knowledge base to be changed.

A virtual commentator as showcase

In order to make these ideas concrete CogViSys aims at developing a virtual commentator which is able to translate visual information into a textual description. This is the unifying theme of the project. In order to build this virtual commentator, several conceptual subgoals have to achieved:

  1. it is crucial that the more cognitive processes can start from a firm basis. Hence, some effort will go into state-of-the-art cue integration,
  2. rather than recognising particular textures, objects, motions, CogViSys aims at recognising instantiations of classes thereof, hence a key goal is to make important progress in the area of categorisation,
  3. approaches will be developed to express and use knowledge about the interpretation of scenes explicitly.
The virtual commentator will have three instantiations within the project:
  1. traffic scene surveillance,
  2. sign language interpretation, and
  3. automated video annotation

The first application aims at the textual description of traffic situations, at a user-specifiable level of abstraction. The second aims at the automatic transcription of sign language to text, where the situation is less constrained than those current systems can cope with (e.g. no special hardware). The third yields a kind of virtual TV commentator and a virtual assistant in a wearable computing context.

The level of performance of the virtual commentator and the progress therein yields criteria for the assessment of the work carried out under the project. The commentator will make explicit the success rate of the categorisation algorithms and the effectiveness of the transferability between different applications. This transferability is the first criterion of success of the project. A second criterion for success will be how low the number of mistakes are that are made in the text output from the commentator in each of the application domains that are used as example cases. A third criterion will be the increase in scope of the system, i.e. the gradual increase in the completeness of the descriptions.