Quest to mine the far corners of the web
5 November 2001

Search engines have come a long way since they first trawled the web's relatively compact dimensions back in the mid-1990s.

From digging up pages of largely irrelevant information in only one language (English) and format (HTML), the search engines of today are not only far more likely to yield accurate results, but will mine images and multi-lingual message boards, audio files and databases in the process of yielding them.

If the ultimate search engine can be defined as one, "that would understand exactly what you meant and give you back exactly what you want," in the words of search company Google co-founder Larry Page, then search engine science is still in its infancy. With search currently the second most popular internet activity after e-mail, the race towards creating the "ultimate" search engine is being highly contested in a packed marketplace.

One of the key challenges facing search companies is understanding the context for queries. "We give search engines so little information that it is impossible for them to give us back exactly what we are looking for," says Danny Sullivan, search engine analyst and editor of the web-based search engine resource, SearchEngineWatch.com.

For Mr Sullivan, the future of search lies in creating personal profiles of searchers. Thus if the engine knows a searcher lives in the UK when they type "football" into the query box, it is more likely to bring up information about Manchester United than The San Francisco 49ers.

Companies such as the UK-based data retrieval company NCorp and the US-based Buzznotes are already beginning to leverage personal information about users to return more accurate results. NCorp uses information about a user's previous hits, clickstream habits and specified interests to tailor its results pages.

"We are starting to make the connection between the user and the web rather than assuming that one size fits all," says Ian Hegarty, technical architect at Alta Vista Europe.

However, privacy and technological issues stand in the way of what would otherwise be a lucrative avenue for search providers. Search company Inktomi has just started experimenting with personalised searches within the corporate arena, using information that is already available about employees to create customised searches within a closed environment.

Taking the technology public is not on the agenda, at least for now. "The issue of privacy makes it hard to move this technology into the public environment. You can't just go leveraging what you know about a person, even if it means getting better search results," says Troy Toman, vice president and general manager of search solutions at Inktomi.

As the web gets bigger and the search companies' task increasingly Herculean, an entire industry is springing up around the business of how to get a website listed high on a search engine results page, now that advertising has become a regular feature on search results pages.

Some search engines, such as LookSmart and Espotting, specialise in pay-for-performance content. Meanwhile, companies such as NetBooster specifically help advertisers find strategies for getting to the top of the results page. "Paid listings are not going to go away for a long time," says Mr Sullivan.

Consumers are becoming increasingly aware of the growing imbalance between editorial content and advertising and are frequently unable to differentiate between the two. In July, the Oregon-based consumer group Commercial Alert accused seven internet search providers of deceptive advertising practices.

The complaint alleged that the companies incorporated paid content from advertisers into their search results without telling consumers. "People are being shown too many paid links in comparison to editorial content. Search engines are going to have to provide a filter for search results as well as ads," says Mr Sullivan.

Elsewhere in the world of search, the quest to mine the farthest corners of the web has become a primary pursuit, with companies such as Google and AltaVista spearheading search into an array of different sources in different languages, from PDF and MP3 files to images and news groups.

"As useful as the web is, so little of the world's information is found on HTML text pages. Our goal is to make the rest of that content available to people, too," says Craig Silverstein, chief technologist at Google.

In tandem with these developments, search companies are working towards bringing results from disparate sources together on a single page, rather than having separate searches for different formats. "The trend towards seamlessly integrating results from multiple information sources will continue, because consumers want to see information from authoritative sources and people and brands they trust," says Bill Bliss, general manager of MSN Search.

But despite the efforts of search providers to access the "deep" web, large amounts of proprietary information, such as pay-per-view newspaper archives, are likely to remain elusive for a long time.

"People who have collected large amounts of information are often quite possessive of it; it's their intellectual property," says Mr Silverstein. "We have to figure out how to make the data available in a way that is helpful to our users but also satisfies the owner of the data."

For some search providers, such as Convera (formerly Excalibur Technologies), the future of search lies in the automation of human thought and linguistic processes. Convera's product aims to abstract the meaning of documents rather just look at their syntactic properties, such as matching keywords.

Some people are sceptical about the success of such "semantic engines". "The history of trying to bridge the syntactic-semantic cut in artificial intelligence has been a history of ignominy," says Anil Seth, post-doctoral fellow in theoretical neurobiology at the Neurosciences Institute in San Diego. "Semantics cannot simply be encoded or decoded from a syntactic foundation. Too many other factors, such as culture and natural language, get in the way."

Truly successful artificial intelligence systems might be far off, but at Inktomi plans are afoot to put humans themselves onto the search results page. Inktomi is developing a system that allows staff to search for experts within their organisation. "We are beginning to look at a person as a valuable piece of content within a network," says Mr Toman.

It looks as though humans are going to be an integral part of search engine technology for quite a while.

© Copyright The Financial Times Limited 2001