Search Engine Overview
Recent developments in Indexing, Searching and Information Retrieval Technologies


 
SE news
  • New largest Search Engine Alltheweb.com launched by Fast Search & Transfer
  • NREN Search and Index Services
    Special purposes Search Engines
    SE Special Services
    SE Technologies
  • Report on the 1999 Search Engines Meeting by Avi Rappoport
  • Search Engines Tools
  • Free Indexing and Searching Software
  • Commercial SW
  • SE tips and links
    Search Engine Projects
    Search Engines Papers
  • Research Papers related to Google!
  • Research Papers related to IBM CLEVER Searching Project
  • Other SE papers
  • SE Legal issues
    Other Sections Index page

    This page is updated regularly, please send your suggestions to: demchenko@terena.nl


    SE news

    Search Engines News
    http://searchenginewatch.com/news.html

    Current Search Engine Report
    http://searchenginewatch.com/sereport/current.html

    Search Engine Size
    http://www.searchenginewatch.com/reports/sizes.html

    News at Web Site Search Tools
    http://www.searchtools.com/info/news.html

    Results from our Site Search Tools Survey!
    http://www.searchtools.com/surveys/survey-results-01.html
    First results from our search tools survey are in, and they're interesting! Most web administrators who haven't installed a site search say it's because they don't have time or the applications are too complex. Those who have cite improved navigation as their number one reason, by far. More surprising results come from sites aimed towards information professionals (many don't have search), and sites with three or more languages (they have search).

    Websearch.miningco.com weekly
    http://websearch.miningco.com/library/weekly/topicmenu.htm?pid=2825&cob=home

    New largest Search Engine Alltheweb.com launched by FAST Search & Transfer

    August 2, 1999 FAST (Fast Search & Transfer) has launched a new site called Alltheweb ("FAST Search: All the Web, All the Time") http://www.alltheweb.com/. The announced size of their index is more than 200 millions pages that is estimated as 25% of all web. See more information.
     

    NREN Search and Index Services

    German Web Index
    http://www.fireball.de/
    Metagenerator - http://www.fireball.de/metagenerator.html
    Metadata scheme - http://www.fireball.de/meta_daten.html
    Fireball was developed by FLP/KIT - http://flp.cs.tu-berlin.de/
    KIT - http://flp.cs.tu-berlin.de/kit/kit.html

    Swiss search service
    http://www.search.ch/
    Allows metadata search - http://www.search.ch/help.html.en

    Nordic Web index
    http://nwi.ub2.lu.se/?lang=en
     

    Special purposes Search Engines

    US Government Search Engine Launched
    A new search engine that focuses on information from US government sources was opened in May. Called Gov.Search, the service is jointly produced by search engine Northern Light and the U.S. Commerce Department's National Technical Information Service through a five-year agreement.
    The service is unusual for the web in that searching is not free. Those wishing to use it must pay for access, which ranges from US $15 for a day pass, $30 for a monthly pass or $250 for a year. Special pricing is also available to companies and organizations that require multiple accounts.
    Northern Light has now indexed about 4 million web pages located on more than 20,000 US government servers, which also include military and some educational sites. In addition to this information, it has also indexed about 2 million specialty records from the NTIS.
    http://searchenginewatch.com/sereport/99/06-govsearch.html
    Gov.Search
    http://www.usgovsearch.com

    Google US Government Search
    http://www.google.com/unclesam
    Google has its own US government search service. Test queries show it to be much smaller than Northern Light's index, yielding only 10 to 50 percent of Northern Light's counts. But the relevancy of some of the matches was impressive. Definitely worth a visit.

    Cora Search Engine
    http://www.cora.justresearch.com/about.html
    Cora is a special-purpose search engine covering computer science research papers.

    SE Special Services

    Northern Light Adds Research Options
    Northern Light now also operates a "research" version of its service, where the default is to search within its Special Collection index. This index has information from over 5,400 publications, much of which is not available on the web. Searching is for free, and then documents can be purchased for between $1 and $4.
    Titles can be downloaded from http://www.northernlight.com/docs/specoll_help_download.html
    http://searchenginewatch.com/sereport/99/06-northernlight.html Northern Light Research Version
    http://www.nlresearch.com/ (http://www.northernlight.com/research.html )
    Northern Light Special Editions
    http://special.northernlight.com/

    Research Service at HotBot
    http://r.hotbot.com/r/hb_also_rsrch/http://www.elibrary.com/s/hotbot/

    "Invisible Web" Revealed
    Lycos and IntelliSeek have teamed up to produce an index of search databases to help users find information that is invisible to search engines. The "Invisible Web Catalog" provides links to more than 7,000 specialty search resources. Users can browse listings or search Lycos index base.
    http://searchenginewatch.com/sereport/99/07-invisible.html
    Lycos Invisible Web Catalog
    http://dir.lycos.com/Reference/Searchable_Databases/

    IntelliSeek
    http://www.intelliseek.com/

    Direct Search
    http://gwis2.circ.gwu.edu/~gprice/direct.htm
    Catalog of specialty databases. Search inside particular database.

    WebData
    http://www.webdata.com/
    Guide to searchable databases. Browse or search through listings.

    Northern Light Adds clustering
    This is to prevent domination of results from one site.
    In addition to pages index NL provides list of Custom Search Folders ™ created/generated of clustered search data by group of servers of type of pages.
    http://www.northernlight.com/docs/search_help_folders.html

    Navigate web smarter and easier with Alexa
    http://www.alexa.com/

    Netscape's keywords service
    http://home.netscape.com/escapes/keywords/
     

    SE Technologies

    Report on the 1999 Search Engines Meeting
    by Avi Rappoport, Search Tools Consulting
    http://www.searchtools.com/info/meetings/searchenginesmtg/index.html

    The main questions discussed:

    For more information read Abstracts of the Report.

    Natural Language Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/)
    Valuable information. Publications http://www.itl.nist.gov/iaui/894.02/works.html

    Information on DARPA TIPSTER Text Program http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/
    http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
     

    IBM Patents Network -
    http://www.patents.ibm.com/

    Lycos holds patent 5,748,954
    (http://www.patents.ibm.com/details?pn=US05748954__&s_clms=1#clms ), which covers roughly any kind of web spider that heuristically downloads "better" documents before "worse" documents, and explicitly includes a reference to looking at how often a document is linked as a goodness heuristic.

    TUSTEP (TUebingen System of Text Processing Programs)
    Munltilingual Textdata Processing and Fuzzy Searching
    http://www.uni-tuebingen.de/zdv/tustep/tdv_eng.html
     

    Search Engines Tools

    Web Site Search Tools
    http://www.searchtools.com/

    Web Site Search Tools - Related Topics


    Search Tools Product Listings
    http://www.searchtools.com/tools/tools.html

    Free Indexing and Searching Software

    Harvest-NG
    Harvest, an open-source project, has been re-implemented in Perl and can summarize documents in SOIF (Summary Object Interchange Format). This version saves the data in a database file and does not include a Broker or search engine, but it is entirely extensible.
    http://www.tardis.ed.ac.uk/harvest/ng/
    http://www.tardis.ed.ac.uk/harvest/ng/develop.shtml

    The Combine System for disributed indexing
    http://www.lub.lu.se/combine/
    http://www.ub.lu.se/~tsao/combine/

    Zebra Information Server
    Powerful free-text indexing and retrieval system, combined with a Z39.50 server. The Zebra server is freely available for noncommercial applications.
    http://www.indexdata.dk/zebra/

    Framework for Advanced Search (ASF)
    http://asf.gils.net/framework.html
    ASF Freeware
    http://asf.gils.net/freeware/index.html

    OCLC Z39.50 freely reusable code (C and Java)
    http://www.oclc.org/z39.50/#api

    Perlfect Search 3.01
    http://perlfect.com/freescripts/search/

    PLWeb Turbo has released a new version, 3.0 for Windows NT with improved performance, customization, web-crawling capability, and a browser-based interface.
    PLWeb and all PLS products are now freeware from AOL.
    http://www.pls.com/plweb.htm
    http://www.searchtools.com/tools/plweb.html

    AltaVista (Windows NT and Unix search tool) has just introduced a free version of AltaVista Search Intranet, Entry Level, which will index up to 3,000 pages.
    http://k2.altavista-software.com/intranet/3000_version/3000_overview.htm

    Commercial SW

    Ultraseek on Linux
    The Ultraseek search engine and the Content Classification Engine now run on Linux Redhat Linux 5.1 on a PC, Kernel 2.0.34 or better, or glibc 2.0.7-19 or better. Commercial
    http://software.infoseek.com/products/ultraseek/ultratop.htm
    Download free trial version
    http://software.infoseek.com/download/download.htm
    http://www.searchtools.com/tools/ultraseek.html

    Ultraseek Content Classification Engine Product Information
    Commercial.
    http://software.infoseek.com/products/cce/ccetop.htm
    http://software.infoseek.com/products/cce/ccekey.htm

    Super Site Searcher Perl CGI works with other modules to create searchable site directory. Commercial.
    http://www.hassan.com/site_searcher/
    http://www.searchtools.com/tools/supersitesearcher.html

    Extense - a powerful search engine developed in France which uses the syntactic declination of French words (masculine/feminine and singular/plural). Commercial.
    http://www.searchtools.com/tools/extense.html

    Inxight LinguistX code library - provides language identification, stemming and tokanization, among other features.
    http://www.searchtools.com/tools/inxight.html
    http://www.inxight.com/
    A collection of componants for many languages that provide word and phrase analysis, stemming, tokanization, parts of speech analysis, noun phrase extraction, language identification, summarization, etc.
    Platform: Windows 95 and NT, Solaris Sparc (will port to other Unix systems). Commercial.

    Verify products
    http://www.verity.com/products/index.html

    Knowledge Retrieval products
    http://www.verity.com/products/knowret1.html
     

    SE tips and links

    Search Engines links
    http://searchenginewatch.com/links/
    Contains such sections:


    Search Tips and Tricks Advanced Searching
    http://websearch.tqn.com/msub21.htm?pid=2825&cob=home
    http://websearch.miningco.com/msub21.htm?pid=2825&cob=home

    Information Retrieval systems
    http://www.mri.mq.edu.au/%7Eeinat/web_ir/software.html

    Top search words and terms
    http://www.searchenginewatch.com/facts/searches.html

    Ask Jeeves Peak Through The Keyhole http://www.askjeeves.com/docs/peek/

    Weekly Search Engine Keyword Statistics For Web and Internet Marketing
    http://www.mall-net.com/se_report/

    Dogpile Top 200 Search Words
    http://www.eyescream.com/dogpiletop200.htm
    Top words from the meta-search engine Dogpile from January to July 1997. Unfortunately, the actual keyword phrases are not shown.

    Search Spy
    http://www.searchspy.com/
    This is a database of search terms available for desktop use. You enter a term, and the program scans to find matches. You can sort results by count or by keyword. Data is gathered from various live search displays.

    Life on the Internet, Finding Things
    http://www.screen.com/start/guide/searchengines.html

    useit.com: Jakob Nielsen's Website
    http://www.useit.com/
    He formulated new approach in SE - LSD: Logo, Search, Directory.
     

    Search Engine Projects

    IBM's CLEVER Searching
    http://www.almaden.ibm.com/cs/k53/clever.html


    Web Archeology Project at Digital Research
    http://www.research.digital.com/SRC/personal/Krishna_Bharat/WebArcheology/
    Contains sections:


    The MetaWeb Project
    The aim of the Metadata Tools and Services project - known as MetaWeb - is to develop indexing services, tools, and metadata element sets in order to promote the use of, and exploitation of metadata on the Internet.
    http://www.dstc.edu.au/Research/Projects/metaweb/

    DFN Indexing and Searching projects - http://www.dfn.de/links/suchen.html
    MetaGer (subject meta search), MESA (email address meta search), Level3 (search service for the DFN-Expo project), Search.de and Entry.de)

    X.500 Directory E-mail Addresses Search (AMBIX-D) - http://ambix.uni-tuebingen.de:8889
     

    Search Engines Papers

    Research Papers related to Google!
    http://google.stanford.edu/google_papers.html


    Research Papers related to IBM CLEVER Searching Project
    http://www.almaden.ibm.com/cs/k53/clever.html
     

    John Kleinberg Homepage
    http://www.cs.cornell.edu/home/kleinber/
    Researches and publications related to IBM's CLEVER Searching project.

    Other SE papers

    TREC Publications
    TREC (the Text REtrieval Conference) sponsored by NIST provides a set of realistic test collections, uniform scoring, unbiased evaluators and a chance to see the changes and improvements of search engines over time.
    Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html

    Retrieval Performance in FERRET: A Conceptual Information Retrieval System
    Michael L. Mauldin
    Appeared at The 14th International Conference on Research and Development in Information Retrieval, Chicago, October 1991, ACM SIGIR.
    http://www.fuzine.com/mlm/sigir91.html

    Enhancing the World Wide Web
    Social Software for the Evolution of Knowledge
    http://www.islandone.org/Foresight/WebEnhance/index.html

    Learning Webs by J. Bollen, & F. Heylighen,
    http://pespmc1.vub.ac.be/LEARNWEB.html
    Hebbian learning can be implemented on the web, by changing the strength of links depending on how often they are used. paper is exploring the "brain" metaphor for making the web more intelligent. The basic idea is that web links are similar to associations in the brain, as supported by synapses connecting neurons. The strength of the links, like the connection strength of synapses, can change depending on the frequency of use of the link. This allows the network to "learn" automatically from the way it is used.

    Identification, location and versioning of web-resources. URI Discussion paper. Version 1.0. 12 March 1999
    Titia van der Werf-Davelaar
    http://www.konbib.nl/donor/rapporten/URI.html
    This document is a discussion document for use in developing a consensus on practical approaches to be pursued for better information management techniques and methods on the Web.
    This work is done in the context of the following projects: DONOR, DESIRE, NEDLIB.

    Report on the WWW8 conference by Nicky Ferguson
    http://www.ilrt.bris.ac.uk/~ecnf/www8.html

    Semantic Web vision paper
    Alexander Chislenko. - Version 0.28 - 29 June, 1997
    http://www.lucifer.com/~sasha/articles/SemanticWeb.html

    SE Legal issues

    Lycos GENERAL TERMS AND CONDITIONS -
    http://www.lycos.com/lycosinc/legal.html
     



    This page is updated regularly, please send your suggestions to: demchenko@terena.nl