Full version of the Report on the 1999 Search Engines Meeting by Avi Rappoport, Search Tools Consulting is at: http://www.searchtools.com/info/meetings/searchenginesmtg/index.html
Portalization and Other Search Trends (by Danny Sullivan
of SearchEngineWatch).
Main trends underlined: turning into portals; increasing relance of
common searches like "travel" or "microsoft"; clustering and directory,
etc.
Quantifiable Results: Testing at TREC
The valuable testing was done at TREC (the Text REtrieval Conference)
sponsored by NIST. TREC provides a set of realistic test collections, uniform
scoring, unbiased evaluators and a chance to see the changes and improvements
of search engines over time.
The TREC test collection consists of about 2 GB of combined newspaper
articles and government reports.
Testing includes a few tracks: Adhoc, Cross-Language, Filtering, High
Precision, Interactive, Query, Spoken Document Retrieval (SDR).
Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html
Summarization
Summarization attempts to reduce document text to its most relevant
content based on the task and user requirements.
Results indicated that many documents can be summarized successfully,
better results are with variable-length summaries. The Information Retrieval
methods applied to this task work well for query-focused summarization,
because the topic focuses the summarization effort.
Valuable information on this issue can be found at Natural Language
Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/).
In May 1998, the U.S. government completed the TIPSTER Text Summarization
Evaluation (SUMMAC), which was the first large-scale, developer-independent
evaluation of automatic text summarization systems. Results are available
for TREC subscribers, final report can be downloaded from http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
Results Clustering and Topic Categorization
Clustering of the found documents into useful groups is a fruitful
approach to improving results presentation.
Some search engines perform automatic clustering and categorization
on result sets, so they are divided into groups by topic. The NorthernLight
Search Engine, for example, cluster its results into Custom Folders that
have partly predefined categories.
The academic case made by James Callen of the University of Massachusetts
shown that full text search with modern relevance rankings is the best
approach for information retrieval.
Consensus of the panel, and the meeting, is that automation can help
humans, and automated categorization is the best when humans can provide
a reality check on the systems.
Cross-Language Information Retrieval (CLIR)
CLIR means querying in one language for documents in many languages.
It's becoming more important due to internationalisation of the web. Approaches
include Machine-readable dictionaries, parallel and comparable corpora,
a generalized vector space model, latent semantic indexing, similarity
thesauruses and interlinguas.
Presentation by TextWise (http://www.textwise.com/)
described their Conceptual Interlingua approach, which uses a concept space
where terms from multiple languages are mapped into a language-independent
schema. This technique is used for both indexing and querying, and does
not require pairwise translation.
Improvements to Relevance Ranking of Results
Two presentations were done by Byron Dom from IBM's CLEVER project
(http://www.almaden.ibm.com/cs/k53/clever.html) and Gary Cullis, the chairman
of Direct Hit (http://www.directhit.com/).
Directories and Question-Answering
This section dealt with current move of SE to provide directory and
Subject Gateway altogether with ordinary or advanced searches. Presentation
were given by LookSmart (http://www.looksmart.com/)
and AskJeeves (http://www.askjeeves.com/).
Knowledge Management
Both Daniel Hoogterp of Retrieval Technologies and Rick Kenny of PCDocs
/ Fulcrum described how search fits into corporate knowledge management.
Text Mining
Data mining means evaluating large amounts of stored data and looking
for useful patterns, like relation between product and age of customers.
Text mining uses techniques from information retrieval and other fields
to analyze internal structure, parse the content, provide results, clustering,
summarization, and so on. With automatic event identification, conditional
responses, reuse of analysis, and graphic presentation of results, the
user can skim the best of the information easily.
Filtering and Routing and Intelligent Agents
Filtering and Routing allow individuals to set up criteria for incoming
data (news feeds, email, press releases, etc.), and only be notified or
sent those items that match their interests. Such task are performed by
Intelligent Agents that travel a network or the Internet to locate data
or track web site changes, evaluating the items using relevance judgments
like those of search engines.
Searching Multimedia
Main discussion was about spoken documents and video retrieval.
Search Realities faced by end users and professional searchers
Carol Tenopir gave presentation on the history of user-centered research
on searching, and current work in testing user experiences.
Visualization
There are some attempt to visualise search results based on document
similarity. It was suggested that the success of this approach depends
very strongly on the needs and experience of the searcher.
SE news | Index page
Recent developments in Indexing, Searching and IR |