Lately I have been looking at many researchers and research papers in the field of computer vision. I could not help but start thinking about this area as a researcher and a faculty member.  I have experience in this field.   I started working in computer vision in 1976 and was one of the most active (based on publication numbers and size of research group etc) researcher in this field until 1995. And even now, my research is concerned with the central issue of how to extract and utilize visual information in large systems. I am more interested in seeing how different sources, including visual, can be utilized together and hence I interact more with multimedia research more than computer vision. I had a chance to see the field grow. Â
I find computer field as one of the most challenging research area that could significantly advance impact of computing. Unfortunately, this field is also so broad that many times researchers get carried away and focus their research mostly in perpheral directions; worse they start considering those peripheral issues or tools used as the central problem and forget completely the central issue. Since I invested significant amount of energy and efforts in this field and still continue to think of solving the problems there, I started thinking about the state of the field and hence this post.
The goal of a computer vision system is to extract meaningful information from one or more images, including video.  This has been a difficult problem, partly because it is, and partly because of the research culture.  Computer vision research naturally attracts researchers from different fields including human vision (both physiology, and psychology including cognitive science), optics, computer graphics, mathematics, and machine learning.  Of course there are people who primarily want to develop computer systems to extract information from one or more images for different applications.  But in a large group of researchers with multitude of interests, goals, and philosophies it is natural that even they forget what is really their goal. This results in a very interesting field which starts addressing problems in every other field but never gets back to its own roots — the goal of developing techniques that will help recover information from one or more images, including videos.
Many researchers find computer vision as an area that will either help them explain theories in human vision. Many other researchers think that they can use computer vision to demonstrate applicability of their techniques in their areas, including mathematics and statistics, graphics, and machine learning.   When reading papers in computer vision or talking to many prominent researchers publishing in computer vision one gets a strong impression that they are very interested in justifying either an existing theory about human vision, also commonly called vision science, or developing a new theory to explain a particular aspect of human vision. I consider their research important and respectable, but also consider that that research is not going to solve problems in developing computers that will deal efficiently with images and video in specific applications. Researchers that brings a tool like machine learning for classification or recognition or other mathematical approaches to formulate and solve problems in computer vision are also doing great work and many times are really helping computer vision advance. But one must exercise caution. Many, if not most, of these researchers are interested in demonstrating effectiveness of their approach and hence they select example images that will fit to their approach rather than develop approaches that will solve the central problem. There is a strong fundamental difference in these two. Use of Corel images and many other constrained images commonly used are doing more harm to the community than service.
A culture that has emerged in computer vision due to conflicting goals and losing sight of the primary goal is to assume that computer vision problem should be solved only using visual information and mathematical models. Human vision system is so effective because it continuously works in concert with all other sensors, and most importantly, with human memory and brain. Take away human memory and other sensors, and human vision becomes very limited. So it is surprising that computer vision researchers even now try to interpret images and video using only the information in that image or video ignoring context, other sensors, memory, and interactivity completely. I feel that computer vision may be more effective by using visual information as a part, possibly the most important part, of a complete information system rather than trying to develop it as an independent component.
Hopefully computer vision researchers will realize this and focus more intellectual energy towards recovering information from images and video using all the information tools that are now available in 2007. So many important problems are waiting to be solved — and the progress in many application fields really depends on progress in computer vision. In fact, currently some of the most challenging problems that information systems are facing, starting from information systems (including search), to pervasive systems, human-centered computing, and other areas are all looking for progress in computer vision.Â
It is not strange then that students graduating in computer vision are not easily finding positions, while real problems are begging to be solved.Â