8. CODE EXTENSION, ISO 2022 AND 2375, ISO 4873 AND 10367

When it became more and more obvious that 26 letters were not sufficient to write national languages correctly, and 94 available positions provided too few possibilities, even with alternative codetables, a different way had to be taken. The result was ISO 2022 . It has become a very complex standard that never got to be implemented in its totality.

The aim of ISO 2022 is to describe the structure of a family of codetables, to provide the possibility of identification of each of these, and by applying these facilities to change tables during communication. The design was based on 7-bit bytes, but was extended later to 8-bit bytes.

A codetable for 7-bit coding consists of 8*16=128 positions, divided into columns, numbered 0-7. Columns 0 and 1 are meant for control characters, called the C0 set, 2/0 for SPACE, 7/15 for DELETE. The other 94 positions are for graphic characters, the GL area. With 8-bit coding there are 128 additional positions available, columns 8-15. The same structure applies here, 8-9 contain the C1 set for controls, A-F constitute the GR area for graphics. The character set that is placed in GL is called G0, that in GR is called G1. It is permitted to take for G1 either 96, or 94 characters, in the latter case two positions of GR remain unused. This is the structure, if one now resorts to code extension, then one has at his disposal extra sets G2 and G3 that are supposed to replace G0 or G1, after "invocation" has been performed. This is achieved at reading a control character LOCKING SHIFT. There are seven different types of this kind. In proceeding this way, the meaning of a byte is changed, and is thus made dependent on the current state of the reading device. We are facing here a "finite state machine". The state remains unchanged until invocation is performed once again.

LOCKING SHIFT functions:
LS0:  G0 in GL    
LS1:  G1 in GL  LS1R:  G1 in GR
LS2:  G2 in GL  LS2R:  G2 in GR
LS3:  G3 in GL  LS3R:  G3 in GR

This approach implies that one has to keep track of the "state" in order to be able to interpret a byte. That means that one has to read in forward direction only, without being able to jump back (serial communication).

At reading bytes, the contents of G0, G1, G2, G3 must be known, this is realized by "designation", a control function that contains the identification (represented by bytes referring to a "registration"). Because during the communication it may be changed, an unlimited number of different characters is available in principle. It must be said that equipment that implements this design never has been marketed.

A simpler idea is the SINGLE SHIFT. This has for effect that only the immediately succeeding byte is interpreted not as that from G1, but as that from G2 of G3:

SINGLE SHIFT functions:
SS2:  from G2
SS3:  from G3

These do not affect the state according to the locking shift. For completeness it has to be said that with 7-bit codes the indication SO for LS1 and SI for LSO is permitted.

1The significance of ISO 2022 primarily rests on the administrative framework that it creates, the structural design, and the requirement for identification of code tables. The chosen form to implement these facilities, that of a control function beginning with ESCAPE, then intermediate bytes and final byte, is little suited anymore to modern views on user friendliness. Moreover, "serial communication" cannot be reconciled with "direct access". At cursor-moving at a display the shift-state gets lost, unless an extra attribute byte records it for every screen position. But doing that introduces in fact double-byte coding (cf. chapter 13).

Because a large variety of G-sets of limited applicability was expected, a registration procedure was set up, specified in ISO 2375, and resulting in the "International Register of Coded Character sets to be used with Escape Sequences". In Chapter 11 this will be discussed in more detail.

ISO 4873 presents the structure for 8-bit codes, where ISO 646 it does for 7-bit codes. One may see ISO 4873 as a simplified ISO 2022, brought in line with reality. There is a C0 and a C1 set, and a G0, G1, G2, G3. Here is the G0 fixed (ASCII), thus excluding code extension in GL. There are 3 levels of implementation, level 1 also fixes G1, (which has still to be supplied), and by doing this also excludes code extension in GR; level 2 allows for single shift; level 3 adds locking shift as well. ISO 4873 is thus only a generic standard. For a specification of a fixed G0 and G1 one has to resort to one of the parts of ISO 8859. These are thus assuming that ISO 4873 Level 1 has been selected.

ISO 10367 provides a collection of G-sets, suitable to be used with level 2 or level 3 of ISO 4873. ISO 10367 contains all G0 and all G1 sets from the parts 1-9 of ISO 8859, but also some additional ones (all these have got an IR nummer too, according to ISO 2375). There is a Supplementary Set of letters that do not occur in Latin 1 or 2, and a set of "box" characters. More may be selected from outside ISO 10367, namely those G-sets from the International Register that fit into GR (the "right half" of the table).

Considering the practical effect of all these standards, we have to admit that only in ISO 8859 a realization of ISO 4873 level 1 has become available, the others remained just paper. This is the reason why ISO 4873 is only rarely mentioned.

The requirement, fixed in the structures of ISO 2022 and 4873, to reserve 64 precious positions for control characters hardly used, has strongly restricted the applicability of the standards. In Chapter 12 we shall see how in the mean time industry has pushed aside the design of ISO 2022.

Arrived at this point it is convenient to present a scheme of the interrelation of the ISO standards for the coding of characters. A list of the titles is given in the Annexes, for those of related subjects as well. (Many of these are now becoming ISO/IEC standards.)

general design structure content C-sets content G-sets  
  7-bit: 646   646 IRV  
    6429 IR (2375) (4873:)
  8-bit: 4873 10538 8859 (level 1)
2022   IR (2375) 10367 and IR (2375) (level 2,3)
  mixed: 6937   6937  
  2/4 octet: 10646   (no C/G structure)