16. CONVERSION AND TRANSLITERATION

16. CONVERSION AND TRANSLITERATION

We have seen that stored data are often differing from each other with respect to variety of letters, in script and in coding. For this reason it is necessary at comparing, merging or interchange of data to modify these properties.

"Conversion" is called the change of coding of characters otherwise unmodified. To this purpose a conversion table is required that indicates which byte is to be replaced by which other byte. In principle, there are needed as much conversion tables as there are possible combinations of codetables, N*(N-1). Often, at transmission over a network, a table is available such as is built in, both at the sender side and at the receiver side. Apart from that the system of communication is working with a codetable of its own, which the user does not need to know. If one or both sides work with octets and the other, or the network itself, only processes septets, information may get lost, without alerting the user, if no arrangements are made for protecting the data.

If it is the script that has to be changed, one speaks of "transliteration" if characters are turned into other characters mechanically, without taking notice of their meaning or pronunciation. One speaks of "transcription" if these aspects are taken into account. Accordingly, a certain Russian name is transliterated to Ershev and transcribed to Yershov. At transliteration it is important to know whether the method chosen is "reversible" or not, that is, whether the original text can be restored exactly by back transliteration or not. In particular, if the process is not 1 to 1, definite caution is required. A name like that of Khrushchov has in Russian only 6 letters.

The term "transformation" may be used for representing all letters from a large repertoire with character combinations taken from a much smaller set. Rules for applying this idea in Europe are under study. It is of great practical importance, because networks are based to a considerable part on 7-bit coding that does not transmit letters from a repertoire larger than ASCII undamaged, if at all.

ISO standards for transliteration are being developed by ISO/TC46/SC2. Rather a number of them has been approved as yet, but the result is disappointing. Transliteration is employed to enable people to handle an unavailable script with the help of commonly provided characters. TC46 standards unfortunately assume that all combinations of letters and diacritics can be processed, which is not true in general, certainly not with T.61 or ISO 6937. Should one have an implementation of ISO 5426 at his disposal, there is no problem, but this situation is extremely rare. There is no simple way out of the transliteration issue. Referencing the TC46 standards for use in applications should be done with extreme reluctance. In practice the official translator determines the way of transliteration in a case. These standards are:

ISO 9 Documentation - Transliteration or Slavic Cyrillic characters into Latin characters

ISO 233 Documentation - Transliteration or Arabic characters into Latin characters

Part 1. Stringent transliteration
Part 2. Simplified transliteration

ISO 259 Documentation - Transliteration or Hebrew characters into Latin characters

Part 1. Stringent transliteration
Part 2. Simplified transliteration

ISO 843 Documentation - Transliteration or Greek characters into Latin characters

ISO 3602 Documentation - Romanization or Japanese (kana script)

ISO 7098 Documentation - Romanization or Chinese