An activity occurring frequently at dataprocessing is sorting. If data are of a numerical nature, the way to proceed is obvious, if, however, letters are involved, some thinking is appropriate. The standards for coded character sets assign to each letter a byte, that may be interpreted as a number. To design a sorting order on this footing seems reasonable, the more where the 26 letters are placed in the table in the right order. But this leaves out a solution for the relative order of small and capital letters, which may be different depending on the structure of the codetable. Should sorting become independent of the code, then the right order of characters has to be explicitly defined.
Furthermore, several languages include in their alphabet more than 26 letters (Danish and Swedish 29, Polish 35). The extra letters are often placed in the order between some that are traditionally part of the alphabet, which excludes the use of the ISO 646 codetable. Should one try to construct a table with the right order for one of these languages, then it would make the code position of the 26 basic letters language dependent, which would make international communication impossible.
EXAMPLE:
Alphabetic order for Polish:
a ‚ b c d e ƒ f g h i j k l m
n œ o ó p q r s † t u v w x y z �
In addition to this, many languages make use of letter combinations
(digraphs) that are sorted as if they were a single letter. In the European
languages it are the following:
Latvian | dŠ ie |
Welsh | ch dd ff ng ll ph rh th |
Breton | ch c'h |
Dutch | ij |
Spanish | ch ll |
Maltese | gh |
+ | ¯ |
Hungarian | cs dz dzs gy ly ny sz zs |
Albanian | dh gj ll nj rr sh th xh zh |
Croat | dŠ lj nj |
Slovak | ch dz dŠ |
Czech | ch |
The conclusion is that sorting correctly is a subject that requires
specific attention, not to be tied to the coding of single characters.
Several ingenious algorithms have been invented to sort automatically and
correctly (for a given language). But it is the user who has to specify
what order he thinks convenient, not the automatisation man. To accept
the existence next to each other of more than one order per country or
language may be the consequence, and the developer of standards should
be well aware of that. The matter has the attention of several groups in
ISO, but nothing has found wide acceptance as yet.