After the ACM Multimedia conference, there were several very interesting workshops: Effective Telepresence, Continuous archival and retrieval of personal experiences, Next Generation Residential broadband challenges, Story representation mechanism and context, Video surveillance and sensor networks, and Multimedia Information Retrieval. Most of these were on very interesting topics, and though I had registered myself to attend two or three workshops, I ended up attending only one – Multimedia Information Retrieval. I presented a paper there and participated on a panel. I wanted to attend Pilho Kim¡¯s presentation and demos because he is a my student, but because I was at this workshop and got very busy talking and discussing with people, I could not walk to other buildings to listen to him and to see how other workshops were going.
The goals of workshops at these conferences is to encourage researchers to present their work in progress. At one time, there used to be a lot of time for discussion of these papers so emerging ideas could be analyzed and discussed in good details. The goal of the workshop slowly changed with increasing number of papers being submitted to workshops. In workshops now, there is hardly any time for discussion at workshops. In fact in many workshops, the time give for presentation is less than in conferences and there are no questions asked because either there is no time left or audience has rarely any questions because the author spent all time giving mathematical details. It is surprising how much emphasis researchers put on mathematical details, rather than on the concepts behind their work. I remember how Richard Feynman used to argue that maximum damage to science is done by definitions and mathematical formalisms.
Multimedia information retrieval is a topic that I am very interested in. There were other workshops, but this was definitely the one most directly in the area that I am very interested in. How do you use different media synergistically to represent information independent of media and organize it so one can easily find the information in the most appropriate media. It appears so obvious to me that in video unless you combine audio and image sequences, you can not understand much. Similarly, even for photos, unless you know the context, it is not easy to understand what is going on and the context comes from different sources, mostly non-pictorial. Yet, I find it very interesting, and disappointing, that after several years of this workshop, even this year most sessions were on the topics such as Learning, Images, Video, Applications, Web-based systems. Interestingly on most of these, there were two sessions. But there was no session emphasizing multimedia – the main topic of the workshop.
Another thing that I find very limiting is that most people adopt strong technology driven perspective and forget completely what is the problem they are really trying to solve or understand. Researchers usually fall in love with their tools and start gaining better and better understanding of their tools, completely forgetting why were these tools considered in the first place. Based on some reading, I came up with a sentence that I used in a panel that tried to capture this in lighter way: The real challenge is to solve a problem rather than worry about the next ¡®Minimum Publishable Unit¡¯. Similarly I joked that ¡®Depth in research is knowing more and more about less and less until you know everything about nothing¡¯. Unfortunately these things are not just jokes; they are more true than one would like to believe.
A very good thing is that many people came to me and talked about these points and want to do something to correct the situation. If enough people become sensitive about these things, maybe the research culture will start improving.
On the positive side, I see good progress in many techniques that are soon going to be ready for semi-automatic annotation of images and videos. These will be very useful in building some real systems.