Standardisation in Indexing, Searching and Information Retrieval
Recent developments in Indexing, Searching and Information Retrieval Technologies

IETF Work: Common Indexing Protocol
IETF work: other standards
Metadata and XML/RDF
Other Sections Index page

This page is updated regularly, please send your suggestions to:


W3C Web Content accessibility initiative (WAI)
Web Content accessibility Guidelines

Web Architecture: Describing and Exchanging Data
W3C Note 7 June 1999
Building a space where automated agents can contribute - just beginning to build the Semantic Web. The RDF Schema design and XML Schema design began independently, proposed common model where they fit together as interlocking pieces of the semantic web technology.

Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation
W3C Note 27 July 1999
In this note we describe a method for using RDF, the Resource Description Format of the W3C, to create a general, yet extensible framework for describing user preferences and device capabilities. This information can be provided by the user to servers and content providers. The servers can use this information describing the user's preferences to customize the service or content provided. The ability of RDF to reference profile information via URLs assists in minimizing the number of network transactions required to adapt content to a device, while the framework fits well into the current and future protocols being developed a the W3C and the WAP Forum.

International Layout
W3C Working Draft 26-July-1999
The following specification extends CSS to support East Asian and Bi-directional text formatting.

Platform for Privacy Preferences (P3P) Specification
W3C Working Draft 7 April 1999
This document describes the Platform for Privacy Preferences (P3P). P3P enables Web sites to express their privacy practices and enables users to exercise preferences over those practices.

POIX: Point Of Interest eXchange Language Specification
W3C Note - 24 June 1999
The "POIX" proposed here defines a general-purpose specification language for describing location information, which is an application of XML (Extensible Markup Language). POIX is a common baseline for exchanging location data via e-mail and embedding location data in HTML and XML documents. This specification can be used by mobile device developers, location-related service providers, and server software developers.

Annotation of Web Content for Transcoding
W3C Note 10 July 1999
This proposal presents annotations that can be attached to HTML/XML documents to guide their adaptation to the characteristics of diverse information appliances. It also provides a vocabulary for transcoding, and syntax of the language for annotating Web content. Used in conjunction with device capability information, style sheets, and other mechanisms, these annotations enable a high quality user experience for users who are accessing Web content from information appliances.

XML Schema Part 1: Structures
W3C Working Draft 6-May-1999
XML Schema: Structures is part one of a two part draft of the specification for the XML Schema definition language. This document proposes facilities for describing the structure and constraining the contents of XML 1.0 documents. The schema language, which is itself represented in XML 1.0, provides a superset of the capabilities found in XML 1.0 document type definitions (DTDs.).

XML Schema Part 2: Datatypes
World Wide Web Consortium Working Draft 06-May-1999
This document specifies a language for defining datatypes to be used in XML Schemas and, possibly, elsewhere.

XHTML™ 1.0: The Extensible HyperText Markup Language
A Reformulation of HTML 4.0 in XML 1.0
W3C Working Draft 5th May 1999
This specification defines XHTML 1.0, a reformulation of HTML 4.0 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4.0. The semantics of the elements and their attributes are defined in the W3C Recommendation for HTML 4.0. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML user agents is possible by following a small set of guidelines.

Document Object Model (DOM) Level 2 Specification
Version 1.0
W3C Working Draft 19 July, 1999
This specification defines the Document Object Model Level 2, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Level 2 builds on the Document Object Model Level 1 ( ).
This release of the Document Object Model Level 2 has all of the interfaces that the final version is expected to have. It contains interfaces for creating a document, importing a node from one document to another, supporting XML namespaces, associating stylesheets with a document, the Cascading Style Sheets object model, the Range object model, filters and iterators, and the Events object model. The DOM WG wants to get feedback on these, and especially on the two options presented for XML namespaces, so that final decisions can be made for the DOM Level 2 specification.

IBM online XML education courses

IETF Work: Common Indexing Protocol

RFC 2651: The Architecture of the Common Indexing Protocol (CIP)
J. Allen, M. Mealling
This document describes the CIP framework, including its architecture and the protocol specifics of exchanging indices.

RFC 2652: MIME Object Definitions for the Common Indexing Protocol (CIP)
J. Allen, M. Mealling
This document describes the definitions of those objects as well as the methods and requirements needed to define a new index type.

RFC 2653: CIP Transport Protocols
J. Allen, P. Leach, R. Hedberg
This document specifies three protocols for transporting CIP requests, responses and index objects, utilizing TCP, mail, and HTTP.

RFC 2654: A Tagged Index Object for use in the Common Indexing Protocol
R. Hedberg, B. Greenblatt, R. Moats, M. Wahl
This document defines a mechanism by which information servers can exchange indices of information from their databases by making use of the Common Indexing Protocol (CIP). This document defines the structure of the index information being exchanged, as well as the appropriate meanings for the headers that are defined in the Common Indexing Protocol.

RFC 2655: CIP Index Object Format for SOIF Objects
T. Hardie, M. Bowman, D. Hardy, M. Schwartz, D. Wessels
This document describes SOIF, the Summary Object Interchange Format, as an index object type in the context of the CIP framework.

RFC 2656: Registration Procedures for SOIF Template Types
T. Hardie
The registration procedure described in this document is specific to SOIF template types.

RFC 2657: LDAPv2 Client vs. the Index Mesh
R. Hedberg
LDAPv2 clients as implemented according to RFC 1777 have no notion on referral. The integration between such a client and an Index Mesh, as defined by the Common Indexing Protocol, heavily depends on referrals and therefore needs to be handled in a special way. This document defines one possible way of doing this.

IETF work: other standards

RFC: 2616. Hypertext Transfer Protocol - HTTP/1.1
R. Fielding,  J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee
HTTP has been in use by the World-Wide Web global information  initiative since 1990. This specification defines the protocol referred to as "HTTP/1.1", and is an update to RFC 2068 HTTP/1.0.

LDAP Service Deployment BOF (lsd2) at IETF45 in Oslo
The purpose of this BOF is to examine the question of deploying LDAP services beyond the context of a single service provider.
Presentations about TISDAG, NDD, DESIRE 2 Directory Service.  Suggested areas of activity: description of service models for large-scale directory, schema recommendations (including i18n/l10n issues), client extensions.

Technical Specification The Norwegian Directory of Directories (NDD)
R.Hedberg, H.Alvestrand
This specification describs what is proposed to be the necessary infrastructure to provide a national directory server infrastructure in Norway for publicly accessible directory servers.

Technical Infrastructure for Swedish Directory Access Gateways (TISDAG)
Leslie Daigle, Roland Hedberg
The strength of the TISDAG project's DAG proposal is that it defines the necessary technical infrastructure to provide a single-access-point service for information on Swedish Internet users.  The resulting service will provide uniform access for all information -- the same level of access to information (7x24 service), and the same information made available, irrespective of the service provider responsible for maintaining that information, their directory service protocols, or the end-user's client access protocol.

An Architecture for Integrated Directory Services
Leslie Daigle, Thommy Eklof
Drawing from experiences with the TISDAG ([TISDAG]) project, this document outlines an approach to providing the necessary infrastructure for integrating such widely-scattered servers into a single service, rather than attempting to mandate a single protocol and schema set for all participating servers to use.
The proposed architecture inserts a coordinated set of modules between the client access software and participating servers.  While the client software interacts with the service at a single entry point, the remaining modules are called upon (behind the scenes) to provide the necessary application support.  This may come in the form of modules that provide query proxying, schema translation, lookups, referrals, security infrastructure, etc.

Use of Language Codes in LDAP
M. Wahl, T. Howes
The Lightweight Directory Access Protocol (LDAPv3, RFC 2251) provides a means for clients to interrogate and modify information stored in a distributed directory system.  The information in the directory is maintained as attributes (RFC 2252) of entries.  Most of these attributes have syntaxes which are human-readable strings, and it is desirable to be able to indicate the natural language associated with attribute values.
This document describes how language codes (RFC 1766) are carried in LDAP and are to be interpreted by LDAP servers.  All implementations MUST be prepared to accept language codes in the LDAP protocols.  Servers may or may not be capable of storing attributes with language codes in the directory.  This document does not specify how to determine whether particular attributes can or cannot have language codes.

Uniform Object Locator - UOL
J. Boynton
A Uniform Object Locator (UOL) provides a hierarchical "human-readable" format for describing the location of any single attribute within any data object. A UOL emulates the internal structure of a data object by dividing a partial URL into two re-usable components; An object constructor and an object name.
The UOL format is particularly suited for retrieval and storage of parameter values through multiple object layers. Its basic construction allows it to be combined with a URL; without modification. Possible uses include distributed object management, XML, and e-business development.

Context and Goals for Common Name Resolution
Larry Masinter, Michael Mealling, Nicolas Popp, Karen Sollins
This document establishes the context and goals for a Common Name Resolution Protocol.

Internationalized Uniform Resource Identifiers (IURI),
Larry Masinter, Martin Duerst

Tags for the Identification of Languages
H. Alvestrand
This document describes a language tag for use in cases where it is desired to indicate the language used in an information object. It also defines a Content-language: header, for use in the case where one desires to indicate the language of document.

RFC 2611: URN Namespace Definition Mechanisms
L. Daigle, D. van Gulik, R. Iannella, P. Faltstrom

i18n and Multilingual support in Internet mail. Standards Overview. Yuri Demchenko

Other Standardisation

Search Engine Standards Project

Domain Restriction Proposal

Standard for Robot Exclusion

Robots META Tag

Metadata and XML/RDF


RFC-2413 Dublin Core Metadata for Resource Discovery

Encoding Dublin Core Metadata in HTML
Internet Draft

Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)

Resource Description Framework - RDF

W3C Resource Description Framework (RDF) Model and Syntax - recommendation

W3C Resource Description Framework (RDF) Schemas - proposed recommendation

Resource Description Framework (RDF)

Metadata and Resource Description

Dublin Core

Dublin Core Metadata Element Set: Reference Description

User Guide Working Draft 1998-07-31

1999-07-02: Dublin Core Elements, Version 1.1 moves to Proposed Recommendation
The Dublin Core Directorate is pleased to announce that a set of revised element definitions (Dublin Core Elements, Version 1.1) has been completed and is for public review and comment as a Proposed Recommendation of the Dublin Metadata Initiative.

CEN/ISSS Workshop on MMI (Metadata for Multimedia Information)

CEN/ISSS Metadata Framework, edited by Stewart Granger

CEN/ISSS' The European XML/EDI Pilot Project

The Role of the XML/EDI Guidelines

Guidelines for using XML for Electronic Data Interchange, Version 0.05, 25th January 1998

The Global Repository Initiative

White Paper on XML Repositories for XML/EDI

Dublin Core/MARC/GILS Crosswalk
Network Development and MARC Standards Office

Character Set and Language Negotiation (2) in Z39.50

Registry of Z39.50 Object Identifiers

Metadata.Net - Metadata Tools and Services

Meta Data Coalition

An Introduction to the Meta Data Coalition's Initiatives

Open Information Model
MDC OIM Version 1.0 review draft, April 1999

OIM proposed models
Knowledge Description Model

Meta Data Interchange Specification MDIS Version 1.1

Metadata/RDF Resources and Publications

Metadata Resources at UKOLN

Prototype Metadata Registry for DESIRE project

RDF Tools - Briefing document

DC News, 1999-08-18
CIMI Announces the release of the Guide to Best Practice: Dublin Core. The document is one important result of the Dublin Core Testbed, an on-going effort to explore the usability, simplicity, and technical feasibility of Dublin Core for museum information. The Guide addresses Dublin Core 1.0 as documented in RFC 2413.

New Metadata Handbook from European Schoolnet
1st December 1998
Describes extended Metadata element set has been extended with a range of additional local (sub)elements from other metadata initiatives including the IMS ( - Instructional Management System) and the ARIADNE set ( - Alliance of Remote Instructional Authoring and Distribution Network for Europe).
The EUN metadata harmonisation is happening in close co-operation with EUC (European Universal Classroom) which has been studying DBS/GER ( - Deutscher Bildungs-Server / German Educational Resources), GEM ( - The Gateway to Educational Materials) and EdNA ( Education Network Australia). In the following you will find a guideline to create and publish metadata, a presentation of the syntax and a thorough description of each of the EUN elements.

Dave Beckett's Resource Description Framework (RDF) Resources

Automatic RDF Metadata Generation for Resource Discovery
Charlotte Jenkins, Mike Jackson, Peter Burden, Jon Wallis

Classifier/matadata generator Demo

Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies
Michael Buckland, with Aitao Chen, Hui-Min Chen, Youngin Kim, Byron Lam, Ray Larson, Barbara Norgard, and Jacek Purat

XML Searching

Building a XML-based Metasearch Engine on the Server

GoXML Search Engine v1.0 - BETA is an XML Context-based Search Processor. Online documentation ( ) and Demonstration ( ). The Goxml Project was launched to create a new breed of Search Vehicle which can index, store and allow accurate searching of XML data. The primary focus is to allow XML developers a tool to locate XML documents on the internet.

This page is updated regularly, please send your suggestions to: