9. TEXTCOMMUNICATION, TELETEX, VIDEOTEX AND ISO 6937

9. TEXTCOMMUNICATION, TELETEX, VIDEOTEX AND ISO 6937

Where the application of SHIFTs did not appear to provide the most satisfactory way towards the solution of the capacity problem of the codetables, a different approach should offer more success. Because many of the extra letters desired could be seen as a simple letter carrying a diacritical mark, it seemed a good idea to code either, letter and diacritic, separately. Already ISO 646 had a similar facility. The sequence Letter Backspace Accent resulted in a single visible symbol by "overprinting". But Accent Backspace Letter created the same effect. Do two different codings now represent two identical characters? And is an E now an E, or the combination of F with Low Line?

To avoid pitfalls like that, adopting strict rules is necessary. ISO 6937 has for characters single letters and combinations of a letter with a diacritic. Only those which occur in a list are legal, the "repertoire" of ISO 6937. The diacritic shall preceed the letter, but is no character in itself. A diacritic as a free-standing character is created by coding a space behind the byte that represents the "diacritical mark". In this way some characters are coded with one, others with two bytes. The number of codeable characters is finite, the 333 of the repertoire.

To make this scheme working a considerable number of letters had to be split into parts. This presented the question what a diacritic is, and what not. The Icelandic letters þ and ð were not suited to decomposition. Next to a "primary" set with the usual letters, a "supplementary" set was required for this kind of things. The "diacritical marks" were put together in a column of the codetable for this set. Thus there are in ISO 6937 three kinds of coded characters, those from the primary set, those from the supplementary set (all represented by one byte), and those others from the repertoire that are coded with two bytes, one for a diacritical mark, one for a letter. The assignment of letters to one or the other category sometimes looks arbitrary. The Danish O WITH STROKE is taken for a single letter, but the A WITH RING ABOVE is split, though the ring is normally connected to the letter.

This "mixed single/double byte" system is utterly awkward at data processing. Programs look to bytes. The number of bytes in a string is no longer identical to the number of characters. Fields in a record are counted to positions, that is to a multiple of a byte, not to letters that may take one or two, depending on their kind. Thus it cannot be predicted how many characters will fit in a field of a fixed number of positions. Columnwise sorting becomes cumbersome.

ISO 6937 was issued in 1983 in two "parts", others would follow. That never happened and both parts have now been revised, and published in a single ISO/IEC 6937 (1994). Because of the problems indicated ISO 6937 never has become accepted in the software-world. Programming languages simply ignored this standard, with the result that data coded with the 6937 method cannot be processed with a Cobol-program. The users dissatisfaction subsequently led to the the development of ISO 8859. There was in JTC1/SC2 even a serious proposal to suppress ISO 6937 completely, and to leave support for the implementations to CCITT.

To avoid users being misled ISO 6937 is called a standard for "text communication". As such it has become an ingredient of systems like TELETEX (CCITT Recommendation T.61) and VIDEOTEX (CCITT Recommendation T.101), which are generally intended to communicate text for reading only, without further processing envisaged.

CCITT has aligned the newer versions of its Recommendations in the T-series to the ISO Standards. At present the following have been issued (CCITT changed its name to ITU-T recently):

CCITT Recommendation T.50 (1992), International Reference Alphabet (equivalent in technical content with ISO/IEC 646:1991).
CCITT Recommendation T.51 (1992), Latin based coded character sets for Telematic services. (equivalent in technical content with ISO/IEC 6937:1994)
CCITT Recommendation T.52 (1993), Non-Latin coded character sets for Telematic services.
CCITT Recommendation T.53 (1993), character coded control functions for Telematic services.
CCITT Recommendation T.61 (1988), Character repertoire and coded character sets for the international Teletex service.

CCITT Recommendation T.101 (1993), International interworking for Videotex services.

Of these T.51 contains the repertoire and the coding method of ISO 6937. T.61 and T.101 also have each a repertoire, a subset of that of T.51. Because T.51 is the reference document, and because there is no need to duplicate information on coding of characters, the coding part of T.61 and T.101 will be removed in the future. Referencing to T.61 instead of to T.51 has thus little sense. Deletion of T.61 has now been announced by ITU-T.

The characters that are omitted in T.61 or T.101 are all specials, not letters. A specification of the characters that do occur in ISO 6937, but not in ISO 8859, is given in the Annexes. Of these, three letters (LL63, LL64, LN63) have been declared deprecated, they have never existed, and were the result of misunderstanding. Some number of letters for Welsh is included in ISO 6937, but not all.