Googling a person who you met at the last party or who you are going to meet in a meeting today has become very common. A problem you often face is that the same name uses different spelling. I know that my first name — Ramesh — appears in mails from my ‘friends’ at least in three different versions — Ramesh, Ramish, Romesh — I don’t know where the later two come from but I do see them commonly — and two more that I see less commonly are Ramess and Ramiz. In any case I saw in an article today that
Language Analysis Systems, or LAS, of Herndon, Va., has devised a series of tools for solving one of the thornier, but often overlooked, problems in search: finding data on a particular individual in a multicultural, error-prone world. The company’s software takes into account alternative spellings, cultural nuances and other linguistic issues as part of an attempt to return the most relevant information for a search query, rather than a laundry list of close matches.
The technique used is v ery simple. Just store all equivalents of a name — a kind of name thesauras — and at query time expand a name by using all its equivalents so even when a person uses only one spelling search engine provides all equivalent names. Of courswe the real work is in getting all equivalent names. LAS has done that.
I find this basic idea very useful not only in names but even in multimedia search. I had a research project in 1995 to search images basically working on this idea — the project did not go anywhere because the student (like many other projects) never got to making it work because of his total incompetence. But then that is a common case in research life. I got busy with other things that appeared more important that time. This article reminded me how this idea can help image search by several popular engines that is currently so bad.