As computer scientists we distinguish between structured, semi-structured, and unstructured data. For example, relational databases are to deal with structured data, and unstructured data is very difficult to deal with.
One is forced to ask what is the fundamental difference between unstructured and structured data and how does data become structured or unstructured. This is a question that is usually not asked — like many other rally interesting questions – researchers just assume that they have structured (or unstructured) data and they have to live with it. There is no choice. Is that really true, however?
It appears that most natural world is closer to being unstructured data, at least on the surface. In fact most sciences try to ‘discover’ the structure and relationships in the data that they get. But that is not the problem for computer scientists. COmputer scientists must deal with processing, storage, communication, and access (or retrieval or search) to data. In the last few years, much concern is about the access to a large volume of data. We are creating, paradoxically due to progress in technology, interesting technical challenges. A major challenge has become the access to text, images, audio, and video. ANd this has resulted in what we commonly call ‘the semantic gap’. The semantic gap is because of the way humans think and the way we computer scientists think while designing our algorithms and technology. Humans like to think in terms of objects and events and build conceptual structures using these objects and events. On the othr hand, in computr science we think in terms of bits and bytes and then start building structures that are ‘trees’, ‘graphs’, ‘arrays’ (that could be images, or audio, or video), and similar structures. Unfortunately there is no easy way to go from what computer scientists build and represent in computers and what people normally think about objects and events in the real world.
How is the semantic gap related to structured or unstructured data? Well I think there is a strong reason to think about semantic gap and the structure in data.