New largest Search Engine Alltheweb.com launched by FAST Search & Transfer

Abstracts
of the Report on the 1999 Search Engines Meeting by Avi Rappoport, Search Tools Consulting

Search Engine Overview

Full version of the Report on the 1999 Search Engines Meeting by Avi Rappoport, Search Tools Consulting is at: http://www.searchtools.com/info/meetings/searchenginesmtg/index.html

Portalization and Other Search Trends (by Danny Sullivan of SearchEngineWatch).
Main trends underlined: turning into portals; increasing relance of common searches like "travel" or "microsoft"; clustering and directory, etc.

Quantifiable Results: Testing at TREC
The valuable testing was done at TREC (the Text REtrieval Conference) sponsored by NIST. TREC provides a set of realistic test collections, uniform scoring, unbiased evaluators and a chance to see the changes and improvements of search engines over time.
The TREC test collection consists of about 2 GB of combined newspaper articles and government reports.
Testing includes a few tracks: Adhoc, Cross-Language, Filtering, High Precision, Interactive, Query, Spoken Document Retrieval (SDR).
Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html

Summarization
Summarization attempts to reduce document text to its most relevant content based on the task and user requirements.
Results indicated that many documents can be summarized successfully, better results are with variable-length summaries. The Information Retrieval methods applied to this task work well for query-focused summarization, because the topic focuses the summarization effort.
Valuable information on this issue can be found at Natural Language Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/). In May 1998, the U.S. government completed the TIPSTER Text Summarization Evaluation (SUMMAC), which was the first large-scale, developer-independent evaluation of automatic text summarization systems. Results are available for TREC subscribers, final report can be downloaded from http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html

Results Clustering and Topic Categorization
Clustering of the found documents into useful groups is a fruitful approach to improving results presentation.
Some search engines perform automatic clustering and categorization on result sets, so they are divided into groups by topic. The NorthernLight Search Engine, for example, cluster its results into Custom Folders that have partly predefined categories.
The academic case made by James Callen of the University of Massachusetts shown that full text search with modern relevance rankings is the best approach for information retrieval.
Consensus of the panel, and the meeting, is that automation can help humans, and automated categorization is the best when humans can provide a reality check on the systems.

Cross-Language Information Retrieval (CLIR)
CLIR means querying in one language for documents in many languages. It's becoming more important due to internationalisation of the web. Approaches include Machine-readable dictionaries, parallel and comparable corpora, a generalized vector space model, latent semantic indexing, similarity thesauruses and interlinguas.
Presentation by TextWise (http://www.textwise.com/) described their Conceptual Interlingua approach, which uses a concept space where terms from multiple languages are mapped into a language-independent schema. This technique is used for both indexing and querying, and does not require pairwise translation.

Improvements to Relevance Ranking of Results
Two presentations were done by Byron Dom from IBM's CLEVER project (http://www.almaden.ibm.com/cs/k53/clever.html) and Gary Cullis, the chairman of Direct Hit (http://www.directhit.com/).

Directories and Question-Answering
This section dealt with current move of SE to provide directory and Subject Gateway altogether with ordinary or advanced searches. Presentation were given by LookSmart (http://www.looksmart.com/) and AskJeeves (http://www.askjeeves.com/).

Knowledge Management
Both Daniel Hoogterp of Retrieval Technologies and Rick Kenny of PCDocs / Fulcrum described how search fits into corporate knowledge management.

Text Mining
Data mining means evaluating large amounts of stored data and looking for useful patterns, like relation between product and age of customers. Text mining uses techniques from information retrieval and other fields to analyze internal structure, parse the content, provide results, clustering, summarization, and so on. With automatic event identification, conditional responses, reuse of analysis, and graphic presentation of results, the user can skim the best of the information easily.

Filtering and Routing and Intelligent Agents
Filtering and Routing allow individuals to set up criteria for incoming data (news feeds, email, press releases, etc.), and only be notified or sent those items that match their interests. Such task are performed by Intelligent Agents that travel a network or the Internet to locate data or track web site changes, evaluating the items using relevance judgments like those of search engines.

Searching Multimedia
Main discussion was about spoken documents and video retrieval.

Search Realities faced by end users and professional searchers
Carol Tenopir gave presentation on the history of user-centered research on searching, and current work in testing user experiences.

Visualization
There are some attempt to visualise search results based on document similarity. It was suggested that the success of this approach depends very strongly on the needs and experience of the searcher.

SE news

Index page
Recent developments in Indexing, Searching and IR