20. RECOMMENDATIONS

20. RECOMMENDATIONS

In the preceding chapters the development and the present situation have been described in the field of characters, repertoires of these, and their coding. This chapter contains recommendations for present and future use of coded character sets in the public sector in the Netherlands. Point of departure are the recommendations such as are included in the NEN standards 1888 and 5825, with respect to interchange of personal data and addresses. This recommendation has been made mandatory per 1 January 1993 for the civil service and is being adhered to on a voluntary basis by other spheres of the public administration and by the sectors Social Security, Health Care and the Police.

The recommendation reads that if the hardware/software is supporting an ISO coding system, that is, one in accordance with the ISO 2022 structure, it shall use, fitting to the functional needs, characters coded in conformance with one of the following standards:

for the basic set: NEN-ISO/IEC 646

for Latin-5: ISO 8859-9

for Teletex: ISO/IEC 6937

These standards have, nevertheless, their limitations.

NEN-ISO/IEC 646 has only a limited repertoire (94 characters, of which 26+26 letters) and is therefore rather unfriendly for designing systems for organisations to approach their clients correctly.

ISO 8859-9 (Latin alphabet nr. 5) has a larger repertoire (191 characters, of which 116 letters) that covers the current Western European languages and Turkish.

ISO/IEC 6937 (technically equivalent to CCITT (now ITU-T) T.51, of which the recently withdrawn T.61 is a large subset, with all letters) has an extensive repertoire by which -as far is known- the needs in the Netherlands in the field of personal data are completely covered. On the contrary, this coded character set has the disadvantage that the coding is taking the form of a single/double byte construct. From a technical point of view this is quite inconvenient, not to say unworkable, at dataprocessing, reason why this method is used as little as possible at storing personal data. In the GBA system, for example, T.61 is applied at interchanging data only, because that has been made mandatory. For recording in local and client databanks a coding system with uniformly two bytes a character is usually adopted. This presents, however, a non-standard approach. Thus, the way of coding may differ considerably between organisations or suppliers of software.

In the EDIFACT-standard (ISO 9735), up to recently, two -rather limited- character sets were permitted. End of 1992 these have been extended with four other repertoires, as follows:

that of ISO 8859-1 - Latin alphabet No. 1 (Western European languages)
that of ISO 8859-2 - Latin alphabet No. 2 (Eastern European languages) (both covering English and German)
that of ISO 8859-5 - Latin/Cyrillic (Cyrillic for European languages)
that of ISO 8859-7 - Latin/Greek

Certainly, this addition is presenting an important functional extension, but it does not mean that all problems are solved, because facilities for Turkish are still lacking.

For middle range planning the standard ISO/IEC 10646 - Universal Multiple-Octet Coded Character Set - Part 1 - Basic Multilingual Plane, published in May 1993, may offer a good and permanent solution. The repertoire of this standard includes all characters needed for writing all scripts in the whole world in use with modern languages (and a number of historical forms), incorporating Chinese and Japanese (some scripts have to wait yet a little). Because of the sheer volume, a more limited subset could satisfy European demands to reduce cost. The subject is now under study with the national standards institutes in Europe.

The preceding leads to the following recommendations:

1. Make a distinction between character repertoire and the coding of it, at selecting a standard to adhere to,

2. Let the decision on a standard depend on functional needs that originate from processing personal data, and do not make it completely oriented to the present possibilities of the current automated system or to the installed input/output hardware. Anticipate stepwise transition. Pay notice at determining functional needs to priorities, be it visual display of data, storage and processing in files, data interchange, or a combination of these.

3. Take for orientation -if anyhow feasible- the solution that ISO/IEC 10646 offers in the long term. The European Subset, and from that the part for Latin script, should satisfy nearly all functional needs and technical requirements for automation. For implementing this guideline hardware and software for processing characters coded with two octets should have become available. At that point facilities for dealing with non-Latin script may not have been provided for. But a Netherlands citizen or even a civil servant cannot be expected, or required, to be able to read or work with these scripts. When a situation arises where this otherwise cannot be avoided, transliteration is to be applied in accordance with established rules.

4. For the transition period to the permanent solution outlined under 3., the following recommendations apply:

Should 7-bit coding only be available:

Select ISO/IEC 646:1991 IRV.

In that case there is no provision for:

letters with accents
non-Latin script

Should 8-bit coding be available:

Select ISO 8859-9 (Latin-5).

Icelandic
"Eastern European"
Vietnamese
non-Latin script

Should there be a situation where Latin-5 is not sufficient, considering functional needs for specific language areas in Europe:

Select ISO 8859-2 (Latin-2).

French
"Northern and Southern European"
Vietnamese
non-Latin script

Should the technical possibilities of the supplying manufacturer not permit implementing Latin-5, but only Latin-1:

Select ISO 8859-1 (Latin-1).

Turkish, of which some letters have been replaced by the
corresponding Icelandic

Should there be a situation, considering functional needs for specific language areas in Europe where a provision for Greek or

Cyrillic script is wanted:

ISO 8859-5 (Latin/Cyrillic),
ISO 8859-7 (Latin/Greek).

In the Netherlands practice the choice made will usually be that to Latin alphabet No. 5. This is justified by the fact that it presents a reasonably complete repertoire for supporting Western European

languages, including Frisian as well, whilst the Turkish language (Turkish names in personal records) can be represented. The special letters for Icelandic are not available then, but it is customary to transliterate these.

Should communication with GBA be required:

For the communication CCITT T.51/T.61 shall be used, (GBA-voorschrift).
T.51/T.61 is not suitable to create files for processing, and only provides coding for Latin script.

for the basic set:	NEN-ISO/IEC 646
for Latin-5:	ISO 8859-9
for Teletex:	ISO/IEC 6937