Let’s assume that computing resources are not the limitation. Remember the famous saying that a picture is worth a thousand words. So let’s apply our computing resources to identify which one thousand words from our lexicon are most relevant to a given picture. Maybe we can even adopt a probabilistic framework and identify N words from our lexicon that are probabilistically significant to the picture. And then we can use these words to represent each image or picture.
This will allow us to use a powerful framework that has been developed for information retrieval beyond detection of words. We can still use inverted file structure and we can use some of the tools like ontologies that help us use domain knowledge or even wordnet that help in using hierarchies and relationships among words. More importantly, this allows us to deal with recognizing words in images as a separate process. That means that image-word recognition problem can be addressed by using machinery from image processing, computer vision, and other application related areas. In fact, in early stages, this could even be a manual or semi-automatic process. In developing manual or semi-automatic approaches, however, careful consideration should be give to the interface between the processes for assigning words and for using words. More independent the two processes are, better the complete system will be.
A by product of this approach maybe creation and addition of new words to the lexicon. These new words maybe meaningful only for images (and video or similar media) and may be marked so. This is not completely new – it has happened in many specialized application areas. For example, fingerprint analysis people have their special words that are not part of the common lexicon, but are definitely part of the extended lexicon. I think it is important to develop such words because that will enrich the language and will help in bringing images, video, audio, and other knowledge modalities under the umbrella of search framework that has established its utility so well in the text domain.
I’m curious why you say the processes for assigning words and for using words should be independent. Are you concerned that if a mechanism developed for assigning words also relies on using them that it will evolve to assign only the specialized set of words it needs, rather than more generally useful words? What does this imply about manual tagging of images by their creators or viewers?
Yes, Ryan, if the two processes (for assigning and using words) are not independent than the language remains limited to a set of people. For a more general language, the words get into lexicons when they become well recognized. And once they are in the lexicons, people can start using them as they want provided their meaning according to lexicon is more-or-less preserved. (Ambiguity in natuaral languages is due to this ‘more-or-less’.) In images by developing an independent assignment process, we can use any process to assign the word — including the manual processes.
So tagging an image becomes a manual process of assigning the words. If I have a powerful segmentation approach, that could be another assignment process.
Is there something else than limited sets of people and thus boundaries between these sets? What are extended lexicons other, than extensions of language, which by their very nature are always used always by limited sets of people? I mean, isn’t it widely understood, in the light of modern and postmodern philosophy and critical theory, that universal languages don’t exist?
Despite my criticisms, which might result from me not maybe totally comprehending everything in the background of Your post, I find the development of new words and languages fascinating. I guess, something like development of tag services or ontology services, which host ONLY tags and ontologies, is something which must be envisioned somewhere at this moment. At the moment, they are tied to applications and domains.
Most natural languages do not qualify for ‘Universal Language’ according to what you say (which I agree) but they are used by more than a few — say more than hundreds — people otherwise they are usually not considered to be a language. So the point is that the lexican and then the grammar for using the lexicon allows use of the language by any one who is inclied so.
I also find the growth of the languages — and lexicons — fascinating process. And I believe that this will be accelarating.