MIR 1: Data Sets

I have been thinking a bit about multimedia information retrieval. So there maybe a few posts related to this topic.

I will start by saying that it has been nice to see that there is a nice data set that is collected and could be used for testing and evalyation purposes. This data set is available from http://cophir.isti.cnr.it/ . They claim to have collected 100 Million images from Flickr in this collection. The following is the abstract of the white paper on this site — giving idea about the collection:

As the number of digital images is growing fast and Content-based
Image Retrieval (CBIR) is gaining in popularity, CBIR systems should
leap towards Web-scale datasets. In this paper, we report on our experience
in building a test collection of more than 50 million images, with
the corresponding descriptive features, to be used in experimenting new
techniques for similarity searching. Since no collection of this scale was
available for research purpose, we had to tackle the non-trivial process of
image crawling and descriptive feature extraction (we used five MPEG-7
features) using the European EGEE computer GRID. The result of this
effort is a test collection, the first of such scale, that will be opened to the
research community for experiments and comparisons.

