Yuri Demchenko
Kiev Polytechnic Institute, Ukraine <demch@cad.ntu-kpi.kiev.ua> Konstantin Chuguev
|
Borka Jerman-Blazic
Jozef Stefan Institute, Slovenia <borka@e5.ijs.si> Claudio Allocchio
|
The paper describes the TERENA Pilot Project on Multilingual MUAs Testing. Problems of multilingual support in Mail User Agent existing in a multilingual environment such as Europe are discussed. The general model of multilingual support in MUAs applied in the project is discussed. The used methodology and the recommendations for basic and extended MUA testing for support of multilinguality are described.
1.0 Introduction1.0 Introduction
2.0 Understanding multilingual support
3.0 The TERENA MUAs multilingual support evaluation pilot project
4.0 The testing and the evaluation scheme4.1 Evaluation of Multilingual features/settings of MUAs5.0 Test Messages Set
4.2 Message Reading procedure
4.3 Message Composing procedure
4.4 Sending/Receiving Messages procedures
6.0 Testing Methodology
7.0 Conclusion
8.0 References
Acknowledgment
Appendix A. List of MUAs to be tested
Notwithstanding the Unicode advent non-English speaking users usually work in local Multilingual (ML) environment (especially, in the case of e-mail) that is not Unicode-based and will remain such a long time. Current practice in ML MUAs uses internal Unicode representation of ML texts/messages providing a multiple charsets/languages conversion or mapping.
Importance of the support for internationalization (i18n) in all Internet services especially in WWW and e-mail is widely recognized by many international forum. Recently, the strategy for support of multilinguality in all Internet services was developed and adopted by IAB and invited experts, and the result was published as an Informational RFC (RFC 2130). The basic architectural model and the basic technologies are also adopted by many organizations: however the applications used over the Internet do not yet support the proposed models and services at satisfactory level. This is why the internationalization problems are still a major discussion topic in many working groups and Technical Committees , like ISO JTC 1 TC 22 WG 20, TC2, the TERENA WG-i18n [1] and WG-MSG [10], CEN/TC304 and CEN/ISSS [2]. A special project that addresses the implementation of internationalization standards in e-mail applications and the best of practice is also currently undertaken by the IMC (Internet Mail Consortium) [3].
Today, the basic standardization work for support of multilinguality in e-mail applications is fairly well established: The MIME (Multipurpose Internet Mail Extensions) set of standards gives clear specifications in order to use any language into message body part and in various header fields without any danger that the underlying e-mail network infrastructure will damage the content. The main information provided in the header field that prevents the damage in the transport is the information about the coded character set used and the transfer encoding scheme like base64 (see RFC 2045 - RFC 2049).
However, in practice e-mail user agents (MUAs) provide different levels of support of the multilinguality. There is also no standard scheme for assessment of the functionality and the level of compliance to the existing standards and recommendations. As a consequence, a common testing methodology (a Multilingual MUA benchmark) was not yet developed and is not yet available.
Usually, the multilingual operation of an MUA is tested by developers in a very simple-minded way, e.g. only the most simple instructions are evaluated like: type, edit, send, receive/view, reply and forward of messages. From the user point of view, very important operations which are usually performed when a user is working in multilingual environment remain untested. Typical operations that are missing in manufactures’ testing of the MUA software are:
2.0 Understanding multilingual support
One of the most relevant parameter that contributes to the existing confusion in the provision of multilingual content is the large number of existing international and proprietary standards: they complicate the content creation and viewing operations, and, as a consequence, also the development and testing of internationalized software for the Internet services.
Multilingual Internet products are not just Multilingual User Interfaces that allow typing and reading messages in selected languages. Multilingual support for example in Internet Mail Systems, assumes usage of a set of mail content safe techniques and standards that include the possibility to compose, transfer and read e-mail messages containing information in different languages, characters sets and encoding. In addition, they must support negotiation between MUAs and MTAs regarding the used/applied languages, character sets and encoding for each message being transferred. The Internet e-mail service is not interactive: it is a store and forward service. Therefore the multilingual negotiation here is different from the one usually used and applied in World Wide Web servers and clients model.
The essential method to verify the multilingual support by a set of MUAs and MTAs over the Internet e-mail service, consists in examining the processing of a test message composed by a set of codes representing the originator’s language and alphabet. When the test message is sent over a number of heterogeneous servers, if the multilingual support is correct, the recipient must be able to receive a readable message, identical to the one sent by the originator. The key for a correct delivery of a multilingual content in electronic mail is to ensure that the message composing software (content creation software) and the client's message reader are applying the same convention for mapping message’s codes to displayable characters (mapping bit combinations to glyphs).
The number of indigenous European languages according to CEN TC 304 is 160. The Internet literate European multilingual community uses more than 30 languages represented by many character sets with different repertoire and different encoding. A common property to all of them is the use of the character-box (or glyph-box) representation or single-byte character sets (SBCS), i.e., each character uses one displayable position. That makes a difference from the other languages used outside Europe.
Most of European languages use the Latin script, which consists of 26 base characters of the English alphabet (A through Z) in upper and lower case. Some additional characters are present in some European languages like French, Spanish or Icelandic, and many characters that are considered as characters composed from the basic ones and the diacritical marks specified in a few basic ISO standards (like ISO 6937). Fourteen diacritical marks, commonly called "accent marks" (this allows to support the nearly 200 diacritical combinations) completes the set for European Languages [4].
The repertoires of the official European languages of the members of the European Union (EU) are specified in ISO 8859-1, while the repertoires of Central and Eastern European languages using Latin alphabet are specified in ISO 8859-2. The Greek alphabet is specified in ISO 8859-7 and the Cyrillic alphabet used in Europe is specified in ISO 8859-5 [5]. The most widely used operating systems like UNIX and Microsoft Windows use their own developed character sets codes (like Windows Code Pages 1250 - 1258 or ANS) for support of the European Languages including Cyrillic languages (Russian, Ukrainian, Belorussian, Bulgarian, etc.) in CP1251 [6]. Standards de-facto for mail and news exchange as well as for WWW information in Russian and Ukrainian speaking communities are KOI8-R (RFC 1489) and KOI8-U (RFC 2319). These different character sets codes implemented in different operating systems are the main source of the incompatibility for the message content provided by MUAs running on these systems.
3.0 The TERENA MUAs multilingual support evaluation pilot project
The Pilot Project to evaluate the most popular Multilingual MUAs was launched at the beginning of 1998 by TERENA WG-MSG and WG-i18n [7]. The project major goal is to provide (within the TERENA technical program) its users community with consistent information and practical recommendations about the level and quality of the multilingual support in these widely used MUAs. The project officially started in April 1998 and is expected to deliver its results by September 1998. The project will specify, by mean of evaluation tests, the properties of each particular MUA and provide necessary recommendations and instructions for a correct configuration of particular MUAs.
The tests will be performed on a number of MUAs with the intention to determine their behavior when configured to work with different national character sets. The list of mail client to be tested was derived from TERENA MUAs usage statistics based on analysis of more than 3000 messages from TERENA Mail archives collected during the period August 1997 - March 1998. The list was approved by the TERENA Working Groups for Internationalization of the network services (WG-I18N) and the TERENA Working Group for Mail and Messaging (WG-MSG). An additional survey performed during the first three months in 1998 has shown that general distribution already known, and the list of used products was thus not changed. Only a number of newly released product were thus added to the test list. A summary of the collected statistics about MUAs use is given in Appendix A, while a more detailed information can be retrieved from the Pilot Project’s homepage [8].
The project plans to evaluate the MUAs in general but some evaluation of the local e-mail encoding practice will be also provided for some special cases with large users community. This applies for countries that use some special coded character sets for their languages e.g. in Russia, Ukraine and Belarus (for DOS, Windows, UNIX, Macintosh, etc.) and in some Central European Countries like Poland, Slovenia, Czech Republic, Hungary etc.
In addition to the project’s main goals another expected result of the project is to propose basic and common test scheme for Multilingual MUAs benchmarking. This part of the project will be based on the experiences collected by the test teams and the problems that have encountered while exploiting the evaluations. A special set of composite multilingual test messages will be also developed in collaboration with the test teams in several countries (e.g. test teams in Central European Countries, Russia and Ukraine) in order to allow the interoperability tests to be performed smoothly and easily.
The project activities are currently undertaken by an international team including representatives from Ukraine, Russia, Slovenia and Scandinavian countries. Other participants are welcome too.
The project objectives can be described as:
where all necessary information about the project and necessary collection of reference documents and standards are provided.
Results of the project will be made available on the TERENA WWW pages and presented at different Internet related conferences or user training workshops in cooperation with TERENA WG-ISUS and TERENA WG-MSG.
4.0 The testing and the evaluation scheme
A general model to verify the multilingual support in MUAs is presented
on Fig.1. The model, developed inside the Pilot Project itself, presents
in flow chart format the whole process envisaged for testing and evaluation,
e.g. message composing, sending, receiving and reading. The various entries
enable to set up the multilingual features and trace the possible locations
of problems.
4.1 Evaluation of Multilingual features/settings of MUAs
Typically, an MUA's Setting includes the following parameters and attributes for different operation modes:
Testing multilingual support of MUAs usually starts with reading the received messages or merged mailbox content. When reading messages, multilingual MUAs should support the following features:
4.3 Message Composing procedure
Message composition in real life requires an active interoperation with the whole user’s working environment. In fact it includes operations like:
These are the common source of problems in sending and receiving messages in multilingual environment:
Each test will be performed in at least 2 character sets, one of which is US ASCII (or ISO 8859-1), and the other with characters that are not part of US-ASCII or ISO 8859-1. Optionally, some tests for support of multibytes Character Sets (UCS ISO 10646, ISO CJK - China, Japan, Korean) will be provided for MUAs that support these types of encoding.
Additional tests will be provided for UTF-8/UTF-7 support and, particularly, to verify the correctness of their internal transfer scheme between Unicode and non-Unicode character sets.
The test messages set is composed by:
Mandatory | tmsg1 - Message with non-ASCII characters/text
in the Subject line
tmsg2 - Message with non-ASCII characters/text in Mail Address free-form name tmsg3 - Message with non-ASCII characters/text in the Message Body text (single part) tmsg4 - Message with non-ASCII characters/text in text/plain attachment
|
Optionally | tmsg5* - Message with ASCII and non-ASCII
characters/text with non-Western language/Encoding Default setting of MUA
(optional)
tmsg6* - Message with UTF-7/UTF-8 Character set in Message Body and
Header (optional)
|
Another source of multiple languages texts will be taken from some regional servers of some international software producers:
Microsoft http://www.microsoft.com6.0 Testing Methodology
Alis Technologies http://www.alis.com
The basic testing scheme includes the MUAs under test and all the tools to generate/check the test message, i.e. the message constructor/inspector tool, the tools to type in test messages using keyboard input or using Copy&Paste functions or the special test message generator tool, in order to enable the use of some special test messages set (see Fig.1).
The tests required to be performed are:
test-1 - Receive all 4 test messages tmsg1-tmsg4 and display them correctly (Change Mail Reader Language/Alphabet/Encoding Options if needed)
test-2 - Print all 4 messages tmsg1-tmsg4 to the standard printer
test-3 - Reply to messages tmsg1 and tmsg2, and check that information is returned in the same character set as it arrived in
test-4 - Reply to message tmsg3 using the IUT's "reply including quote of body"
test-5 - Reply to message tmsg3 using the environment's "cut and paste" function to insert the non-ASCII characters into the outgoing message
test-6 - Forward all 4 messages to the originator address
test-7 - Generate, as completely as possible, the same messages from the keyboard of the IUT
test-8* - Check possible text distortion when exchanging by message tmsg5* with different (non-ASCII) Default Language/Alphabet/Encoding setting
test-9* - Provide tests 1-5 for message tmsg6* with UTF-7/UTF-8 Character set
For each test, the possible results are expressed as "pass/fail/maybe", providing the possible explanation to the problems discovered. The final results of testing group of MUAs will be provided jointly with recommendations about how to use properly with the evaluated MUAs, i.e. how to correctly set its parameters, etc.
The international environment of the project and its wide geographical range will allow to discover the main problems in multilingual MUAs current support, also where these problems arise due to the use of different languages, cultural contents and different usage practice within the European multilingual community. Notwithstanding the expected wide implementation of Unicode everywhere, there are still problems which usually show up clearly when working in multiple languages and character sets environment: thus, from the user’s point of view there are still many issues which require solutions for the benefit of the community.
The TERENA evaluation project will provide a detailed set of testing
results, a set of multilingual test messages and a recommended testbed
for the forthcoming MUAs that will help developers and users in their daily
work. Another interesting off-line product of the project will be creation
of the online MUA’s Test Tools. The current version of these test tools
is located at the following URL
http://park.kiev.ua/multiling/ml-mua/testcon.html
We wish to express our special acknowledgments to Peter Heijmens Visser
from TERENA who provided MUAs usage statistics based on detailed analysis
of TERENA Mail archive for the period preceding Project realization. More
over, all our appreciation goes to Harald T. Alvestrand, from Maxware Norway,
for its original idea about the project itself, and for its precious support
during the definition of the project details.
List of MUAs to be tested
Based on MUAs usage statistics in TERENA community for sample of more than 3000 messages.
Updated version is available at http://park.kiev.ua/multiling/ml-mua/mua-statist.html
MS Windows | UNIX
|
||
MUA | Number of usage per 1000 | Number of usage per 1000 | |
Netscape Mozilla (All Versions) | 201 | ELM | 67 |
Windows Eudora | 165 | pine | 40 |
MS Mailers (All Versions) | 78 | exmh | 20 |
|
14 | Netscape Mail for UNIX | 13 |
|
31 | Z-Mail | |
Pegasus | 84 | ||
The Bat! by RIT Research Labs. | |||
Additional list | |||
Simeon | 17 | ||
Forte Agent | 28 | ||
Alis Tango Mailer |