18. ORDERING PROBLEMS

An activity occurring frequently at dataprocessing is sorting. If data are of a numerical nature, the way to proceed is obvious, if, however, letters are involved, some thinking is appropriate. The standards for coded character sets assign to each letter a byte, that may be interpreted as a number. To design a sorting order on this footing seems reasonable, the more where the 26 letters are placed in the table in the right order. But this leaves out a solution for the relative order of small and capital letters, which may be different depending on the structure of the codetable. Should sorting become independent of the code, then the right order of characters has to be explicitly defined.

Furthermore, several languages include in their alphabet more than 26 letters (Danish and Swedish 29, Polish 35). The extra letters are often placed in the order between some that are traditionally part of the alphabet, which excludes the use of the ISO 646 codetable. Should one try to construct a table with the right order for one of these languages, then it would make the code position of the 26 basic letters language dependent, which would make international communication impossible.

EXAMPLE:

Alphabetic order for Polish:

a ‚ b c  d e ƒ f g h i j k l m

n œ o ó p q r s † t u v w x y z  �
 
 

In addition to this, many languages make use of letter combinations (digraphs) that are sorted as if they were a single letter. In the European languages it are the following:
 
 
Latvian  dŠ ie
Welsh  ch dd ff ng ll ph rh th
Breton  ch c'h
Dutch  ij
Spanish  ch ll
Maltese  gh
¯
Hungarian  cs dz dzs gy ly ny sz zs
Albanian  dh gj ll nj rr sh th xh zh
Croat  dŠ lj nj
Slovak  ch dz dŠ
Czech  ch

The conclusion is that sorting correctly is a subject that requires specific attention, not to be tied to the coding of single characters. Several ingenious algorithms have been invented to sort automatically and correctly (for a given language). But it is the user who has to specify what order he thinks convenient, not the automatisation man. To accept the existence next to each other of more than one order per country or language may be the consequence, and the developer of standards should be well aware of that. The matter has the attention of several groups in ISO, but nothing has found wide acceptance as yet.