|
Quest to mine the far corners of the web
5 November 2001
Search engines have come a long way since they first
trawled the web's relatively compact dimensions back in the mid-1990s.
From digging up pages of largely irrelevant information
in only one language (English) and format (HTML), the search engines of
today are not only far more likely to yield accurate results, but will
mine images and multi-lingual message boards, audio files and databases
in the process of yielding them.
If the ultimate search engine can be defined as one,
"that would understand exactly what you meant and give you back exactly
what you want," in the words of search company Google co-founder Larry
Page, then search engine science is still in its infancy. With search
currently the second most popular internet activity after e-mail, the
race towards creating the "ultimate" search engine is being highly contested
in a packed marketplace.
One of the key challenges facing search companies
is understanding the context for queries. "We give search engines so little
information that it is impossible for them to give us back exactly what
we are looking for," says Danny Sullivan, search engine analyst and editor
of the web-based search engine resource, SearchEngineWatch.com.
For Mr Sullivan, the future of search lies in creating
personal profiles of searchers. Thus if the engine knows a searcher lives
in the UK when they type "football" into the query box, it is more likely
to bring up information about Manchester United than The San Francisco
49ers.
Companies such as the UK-based data retrieval company
NCorp and the US-based Buzznotes are already beginning to leverage personal
information about users to return more accurate results. NCorp uses information
about a user's previous hits, clickstream habits and specified interests
to tailor its results pages.
"We are starting to make the connection between the
user and the web rather than assuming that one size fits all," says Ian
Hegarty, technical architect at Alta Vista Europe.
However, privacy and technological issues stand in
the way of what would otherwise be a lucrative avenue for search providers.
Search company Inktomi has just started experimenting with personalised
searches within the corporate arena, using information that is already
available about employees to create customised searches within a closed
environment.
Taking the technology public is not on the agenda,
at least for now. "The issue of privacy makes it hard to move this technology
into the public environment. You can't just go leveraging what you know
about a person, even if it means getting better search results," says
Troy Toman, vice president and general manager of search solutions at
Inktomi.
As the web gets bigger and the search companies' task
increasingly Herculean, an entire industry is springing up around the
business of how to get a website listed high on a search engine results
page, now that advertising has become a regular feature on search results
pages.
Some search engines, such as LookSmart and Espotting,
specialise in pay-for-performance content. Meanwhile, companies such as
NetBooster specifically help advertisers find strategies for getting to
the top of the results page. "Paid listings are not going to go away for
a long time," says Mr Sullivan.
Consumers are becoming increasingly aware of the growing
imbalance between editorial content and advertising and are frequently
unable to differentiate between the two. In July, the Oregon-based consumer
group Commercial Alert accused seven internet search providers of deceptive
advertising practices.
The complaint alleged that the companies incorporated
paid content from advertisers into their search results without telling
consumers. "People are being shown too many paid links in comparison to
editorial content. Search engines are going to have to provide a filter
for search results as well as ads," says Mr Sullivan.
Elsewhere in the world of search, the quest to mine
the farthest corners of the web has become a primary pursuit, with companies
such as Google and AltaVista spearheading search into an array of different
sources in different languages, from PDF and MP3 files to images and news
groups.
"As useful as the web is, so little of the world's
information is found on HTML text pages. Our goal is to make the rest
of that content available to people, too," says Craig Silverstein, chief
technologist at Google.
In tandem with these developments, search companies
are working towards bringing results from disparate sources together on
a single page, rather than having separate searches for different formats.
"The trend towards seamlessly integrating results from multiple information
sources will continue, because consumers want to see information from
authoritative sources and people and brands they trust," says Bill Bliss,
general manager of MSN Search.
But despite the efforts of search providers to access
the "deep" web, large amounts of proprietary information, such as pay-per-view
newspaper archives, are likely to remain elusive for a long time.
"People who have collected large amounts of information
are often quite possessive of it; it's their intellectual property," says
Mr Silverstein. "We have to figure out how to make the data available
in a way that is helpful to our users but also satisfies the owner of
the data."
For some search providers, such as Convera (formerly
Excalibur Technologies), the future of search lies in the automation of
human thought and linguistic processes. Convera's product aims to abstract
the meaning of documents rather just look at their syntactic properties,
such as matching keywords.
Some people are sceptical about the success of such
"semantic engines". "The history of trying to bridge the syntactic-semantic
cut in artificial intelligence has been a history of ignominy," says Anil
Seth, post-doctoral fellow in theoretical neurobiology at the Neurosciences
Institute in San Diego. "Semantics cannot simply be encoded or decoded
from a syntactic foundation. Too many other factors, such as culture and
natural language, get in the way."
Truly successful artificial intelligence systems might
be far off, but at Inktomi plans are afoot to put humans themselves onto
the search results page. Inktomi is developing a system that allows staff
to search for experts within their organisation. "We are beginning to
look at a person as a valuable piece of content within a network," says
Mr Toman.
It looks as though humans are going to be an integral
part of search engine technology for quite a while.
© Copyright The Financial Times Limited 2001
|