Sunday, July 30, 2006

Thinking about search engines

Berners-Lee argued that part of the Semantic Web is about identifying the originator of information, and identifying why the information can be trusted, not just the content of the information itself at a conference in Boston sponsered by AAAI last Tuesday. Semantic Web could be a solution to the problem of the Internet deception through identifying the web content creators. This association between Sementic Web and author identification is also mentioned in another paper that introduces the project Flink.
This argument leads to thinking about the ontology again. If you google any keyword, you would get the first several results by ontology. You google a person, you get his/her weblog, since it collects one's most complete information. If it doesn't exist, you get one's homepage hosted in one's working place. If it doesn't exist, ... Anyway, the web sources are more heterogenous than only to search a single databases like InforZoom by A9 people search. If you google a book, you get its Amazon link and its authors blog, such as Smart Mob and Freakonomics etc. If you search for a film, you get one entry from the imdb and so on and so forth. This ontology based search is not in line with Google's PageRank method, except that for example each book metioned in a web site is linked by Amazon and each paper by a DOI etc. Certainly, each keyword has various contexts, like the typical example of "Java" for computer scientists. In this case, the temporal context can confine the ontology. Before the java language was defined and the coffee was produced, it refers certainly to the island. So before the Da Vinci Code was pictualized, the search result was a book. Google might have used this search criteria, since Google Trends has been launched and could be used to get the ontology based on temporal facts.
Ontology based search engine might strengthen the web giants such as Amazon, WikiPedia in each field. Is it good to solve the problem of the Internet deception?