6. INTERNATIONAL STANDARDIZATON OF 8-BIT CODES, ISO 8859

6. INTERNATIONAL STANDARDIZATON OF 8-BIT CODES, ISO 8859

With the rise of computers based on 8-bit structures with IBM and its followers, since 1965, gradually the need disappeared to restrict one to 7-bit coding systems from cost considerations. It had no sense to leave the 8th bit unused. As a result of the spreading of EBCDIC on the large machines, it lasted to 1987 before the first 8-bit code based on ASCII was adopted by ISO.

The structure of ISO 8859 is that of ISO 646, but "doubled". The codetable uses the columns 2-7, A-F for graphic characters. The filling of 2-7, the "left half" (GL), is identical to that of ASCII (which is equal to that of ISO 646:1991 IRV, not to that of the old IRV, from ISO 646:1983). For the "right half" (GR) a selection had to be made, the total number of of letters required for European languages using Latin script being too large for 96 positions. The selections that were made are indicated with LATIN-1, LATIN-2, LATIN-3, LATIN-4. With each of these is corresponding a "Part" of ISO 8859. After approval of these four, the combinations of ASCII with Cyrillic, Arabic, Greek and Hebrew as "parts" were adopted. Because LATIN-3 was little satisfactory for Turkish, LATIN-5 has been added, and later LATIN-6 for Scandinavian languages as well.

ISO 8859 8-bit single byte coded graphic character sets, in Parts:

ISO 8859-1:1987 Latin alphabet no. 1
ISO 8859-2:1987 Latin alphabet no. 2
ISO 8859-3:1988 Latin alphabet no. 3
ISO 8859-4:1988 Latin alphabet no. 4
ISO 8859-5:1988 Latin/Cyrillic alphabet
ISO 8859-6:1987 Latin/Arabic alphabet
ISO 8859-7:1987 Latin/Greek alphabet
ISO 8859-8:1988 Latin/Hebrew alphabet
ISO 8859-9:1989 Latin alphabet no. 5
ISO 8859-10:1993 Latin alphabet no. 6

Table 6 presents the codetables for LATIN-1, -2 en -5, and a list of languages covered by the six Latin alphabets.

A list of all characters in ISO 8859-1, -2, -3, -4, with the official names (cf. Chap. 15), the Latin alphabet in which they occur and their code position, is included in the Annexes. (The meaning of the column under SID is explained in Chapter 15.)

The tables contain apart from extra letters also an additional set of special signs and marks, which has, however, not the same extent in all Parts. The copyright sign was obviously only relevant to the capitalist part of Europe. The accents included are to be used free-standing only. With NBS is meant No-Break Space. This is a space that is distinct from the normal one in that it is not interpreted as a delimiter at word-processing , but is part of the word itself. SHY means SOFT HYPHEN, it is just a hyphen like the HYPHEN-MINUS, but it is only displayed if a certain condition is satisfied, for example, at the end of a line when a word is hyphenated.

ISO 8859 is a considerable progress in the sense that it allows coding of the more important Western European languages, such as English, French and German, without loss of the accents. That the French ÷ × are not available in LATIN-1 is still a matter of regret. Historic mistakes in standards can hardly be repaired anymore.

1For the user in the Netherlands the six Latin alphabets are the more interesting. At implementation, however, one has to select one of the six, and letters needed for writing languages used in other parts of Europe are not available. For this reason it is not possible to combine certain languages in the same text. A quotation from French in a Czech text (see in Table 2) cannot be coded correctly if applying LATIN-2. The barrier created by ISO 8859 between languages geographically coincides with the Iron Curtain, which prolongs its existence here.

The design of ISO 8859 covered initially:

LATIN-1 Western Europe
LATIN-2 Eastern Europe
LATIN-3 Southern Europe
LATIN-4 Northern Europe

The extents were overlapping each other sometimes. Both with LATIN-1 and with LATIN-2 German and English may be coded. In the Annexes a map is included demonstrating the areas where one of these two can be applied. Outside these fall the Baltic languages, Maltese and Turkish.

It soon became clear that the Turks were dissatified with LATIN-3, with the result that a new LATIN-5 was created. Unfortunately, the effect is now that Turkish and Icelandic exclude each other in a text.

The Scandinavians wanted Sami (Lappish) to be included, resulting in LATIN-6. With these additions the design of ISO 8859, initially so clean, became increasingly untidied. The present situation is therefore rather confusing with respect to North and South.

In Northern Europe LATIN-1 is well suited to the needs of the main languages. LATIN-6 appeared not to be acceptable to the Baltic countries. For these ("Baltic Rim") a codetable was designed, which has at present only a ISO-IR registration number (IR 179), and contains Polish as well, but not Sami. How things will develop in actual practice we have to await, now that we may select from four codetables:

LATIN-1 : no Sami, Estonian, Latvian, Lithuanian
LATIN-4 : no Icelandic, but Sami, Estonian, Latvian, Lithuanian
LATIN-6 : no Latvian, but Sami, Estonian, Lithuanian, Icelandic
IR 179 : no Sami or Icelandic, but Polish and Baltic languages

For Southern Europe the matter is simpler. LATIN-3 is only still required for Maltese and Esperanto. For Turkish the only choice left is LATIN-5.

After what has been said above, the choice to be made by the Netherlands was not difficult anymore. Proceeding from the needs of Western European languages, only LATIN-1 and LATIN-5 could be candidates. Turkish had priority over Icelandic, thus Latin Alphabet nr. 5 (ISO 8859-9) has been selected as the national Netherlands standard for an 8-bit coded character set.

1Technical developments show that LATIN-1 has already been implemented on a large scale, at a significant distance followed by LATIN-2. But the differences between LATIN-1 an LATIN-5 are not great. In the case that LATIN-1 can be supplied by a producer for an application, but not LATIN-5, only at a few places modifications have to made to adapt a system to the requirements.

The following correspondences exist between the coding of the six letters that make up the differences in the repertoires of LATIN-1 and LATIN-5, (LATIN is omitted from the names):

Code Icelandic Turkish

13/00 Ð CAPITAL LETTER ETH " CAPITAL LETTER G WITH BREVE

13/13 Ý CAPITAL LETTER Y WITH ACUTE CAPITAL LETTER S WITH CEDILLA

13/14 Þ CAPITAL LETTER THORN • CAPITAL LETTER I WITH DOT ABOVE

15/00 ð SMALL LETTER ETH " SMALL LETTER G WITH BREVE

15/13 ý SMALL LETTER Y WITH ACUTE ‡ SMALL LETTER S WITH CEDILLA

15/14 þ SMALL LETTER THORN – SMALL LETTER DOTLESS I

If it is a matter of display only, then modification of keyboard or printer is the thing to do in the first place, but text-processing presents more problems, if only we think of the handling of the "I", ("I" is no longer the capital letter of the "i", because Turkish has a DOTLESS I and a CAPITAL LATIN LETTER I WITH DOT ABOVE).

TABLE 6

8-BIT CODETABLES

ISO 8859-1 LATIN-1

00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/

/00 SP 0 @ P ` p NBS ° À Ð à ð

/01 ! 1 A Q a q ¡ ± Á Ñ á ñ

/02 " 2 B R b r ¢ ² Â Ò â ò

/03 # 3 C S c s £ ³ Ã Ó Õ ó

/04 $ 4 D T d t ¤ ´ Ä Ô ä ô

/05 % 5 E U e u ¥ µ Å [ ] õ

/06 & 6 F V f v ¦ ¶ Æ Ö æ ö

/07 ' 7 G W g w § · Ç ç

/08 ( 8 H X h x ¨ ¸ È Ø è ø

/09 ) 9 I Y i y © ¹ É Ù é ù

/10 * : J Z j z ª º Ê Ú ê ú

/11 + ; K ã k { « » Ë Û ë û

/12 , < L \ l ^ ¼ Ì Ü ì

/13 - = M ü m } SHY ½ Í Ý í ý

/14 . > N ¬ n ~ ® ¾ Î Þ î þ

/15 / ? O _ o ¯ ¿ Ï ß ï ÿ

ISO 8859-2, LATIN-2

00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/

/00 SP 0 @ P ` p NBS ° ‰ Ð

/01 ! 1 A Q a q ‹ ‚ Á á œ

/02 " 2 B R b r › ˜ Â â

/03 # 3 C S c s „ ’ Ó � ó

/04 $ 4 D T d t ¤ ´ Ä Ô ä ô

/05 % 5 E U e u ‘ " "

/06 & 6 F V f v ˆ † ž Ö ö

/07 ' 7 G W g w § ™ Ç ç

/08 ( 8 H X h x ¨ ¸ — �

/09 ) 9 I Y i y Ÿ É é

/10 * : J Z j z ‡ Œ Ú ƒ ú

/11 + ; K ã k { Ë - ë

/12 , < L \ l … Ü

/13 - = M ü m } SHY Í Ý í ý

/14 . > N ¬ n ~ � Š Î î

/15 / ? O _ o € � Ž ß � š

ISO 8859-9, LATIN-5

00/ 01/ 02/ 03/ 04/ 05/ 06/ 07/ 08/ 09/ 10/ 11/ 12/ 13/ 14/ 15/

/00 SP 0 @ P ` p NBS ° À " à "

/01 ! 1 A Q a q ¡ ± Á Ñ á ñ

/02 " 2 B R b r ¢ ² Â Ò â ò

/03 # 3 C S c s £ ³ Ã Ó Õ ó

/04 $ 4 D T d t ¤ ´ Ä Ô ä ô

/05 % 5 E U e u ¥ µ Å [ ] õ

/06 & 6 F V f v ¦ ¶ Æ Ö æ ö

/07 ' 7 G W g w § · Ç ç

/08 ( 8 H X h x ¨ ¸ È Ø è ø

/09 ) 9 I Y i y © ¹ É Ù é ù

/10 * : J Z j z ª º Ê Ú ê ú

/11 + ; K ã k { « » Ë Û ë û

/12 , < L \ l ^ ¼ Ì Ü ì

/13 - = M ü m } SHY ½ Í • í –

/14 . > N ¬ n ~ ® ¾ Î î ‡

/15 / ? O _ o ¯ ¿ Ï ß ï ÿ

List of languages with number of the Latin alphabet that contains the required letters

Latin 1 Latin 2 Latin 3 Latin 4 Latin 5 Latin 6 Latin 1 Latin 2 Latin 3 Latin 4 Latin 5 Latin 6

L Lithuanian 4 6 F French 1 5

L Latvian 4 C Catalan 1 5

E Estonian 4 6 S Spanish 1 5

F Finnish 1 4 5 6 G Galician 1 5

S Sami 4 6 P Portuguese 1 5

S Swedish 1 4 5 6 B Basque 1 5

N Norwegian 1 5 M Maltese 3

D Danish 1 5 I Italian 1 5

F Faroese 1 R Rhaetian 1 5

I Icelandic 1 R Romanian 2

G Greenlandic 1 5 H Hungarian 2

G Gaelic 1 5 A Albanian 2

I Irish 1 5 T Turkish (3) 5

W Welsh C Croat 2

B Breton 1 5 S Slovene 2

E English 1 2 3 4 5 6 S Slovak 2

F Frisian 1 5 C Czech 2

D Dutch 1 5 P Polish 2

A Afrikaans 1 5 S Sorbian 2

G German 1 2 5 E Esperanto 3

* without ÷ ×

Code	Icelandic		Turkish
13/00	Ð	CAPITAL LETTER ETH	"	CAPITAL LETTER G WITH BREVE
13/13	Ý	CAPITAL LETTER Y WITH ACUTE		CAPITAL LETTER S WITH CEDILLA
13/14	Þ	CAPITAL LETTER THORN	•	CAPITAL LETTER I WITH DOT ABOVE
15/00	ð	SMALL LETTER ETH	"	SMALL LETTER G WITH BREVE
15/13	ý	SMALL LETTER Y WITH ACUTE	‡	SMALL LETTER S WITH CEDILLA
15/14	þ	SMALL LETTER THORN	–	SMALL LETTER DOTLESS I

		Latin 1	Latin 2	Latin 3	Latin 4	Latin 5	Latin 6			Latin 1	Latin 2	Latin 3	Latin 4	Latin 5	Latin 6
L	Lithuanian				4		6	F	French	1				5
L	Latvian				4			C	Catalan	1				5
E	Estonian				4		6	S	Spanish	1				5
F	Finnish	1			4	5	6	G	Galician	1				5
S	Sami				4		6	P	Portuguese	1				5
S	Swedish	1			4	5	6	B	Basque	1				5
N	Norwegian	1				5		M	Maltese			3
D	Danish	1				5		I	Italian	1				5
F	Faroese	1						R	Rhaetian	1				5
I	Icelandic	1						R	Romanian		2
G	Greenlandic	1				5		H	Hungarian		2
G	Gaelic	1				5		A	Albanian		2
I	Irish	1				5		T	Turkish			(3)		5
W	Welsh							C	Croat		2
B	Breton	1				5		S	Slovene		2
E	English	1	2	3	4	5	6	S	Slovak		2
F	Frisian	1				5		C	Czech		2
D	Dutch	1				5		P	Polish		2
A	Afrikaans	1				5		S	Sorbian		2
G	German	1	2			5		E	Esperanto			3