Testing multilingual support in Mail User Agents @ 13th Unicode by Yuri Demchenko, Konstantin Chuguev, Borka Jerman-Blazic, Claudion Allocchio

Testing multilingual support in Mail User Agents

Yuri Demchenko
Kiev Polytechnic Institute, Ukraine
<demch@cad.ntu-kpi.kiev.ua>

Konstantin Chuguev
Ural Technical University, Russia
<joy@URC.AC.RU>

Borka Jerman-Blazic
Jozef Stefan Institute, Slovenia
<borka@e5.ijs.si>

Claudio Allocchio
Sincrotrone Trieste & INFN Trieste, Italy
<Claudio.Allocchio@elettra.trieste.it>

Abstract

The paper describes the TERENA Pilot Project on Multilingual MUAs Testing. Problems of multilingual support in Mail User Agent existing in a multilingual environment such as Europe are discussed. The general model of multilingual support in MUAs applied in the project is discussed. The used methodology and the recommendations for basic and extended MUA testing for support of multilinguality are described.

1.0 Introduction
2.0 Understanding multilingual support
3.0 The TERENA MUAs multilingual support evaluation pilot project
4.0 The testing and the evaluation scheme
4.1 Evaluation of Multilingual features/settings of MUAs
4.2 Message Reading procedure
4.3 Message Composing procedure
4.4 Sending/Receiving Messages procedures
5.0 Test Messages Set
6.0 Testing Methodology
7.0 Conclusion
8.0 References
Acknowledgment
Appendix A. List of MUAs to be tested

1.0 Introduction

Notwithstanding the Unicode advent non-English speaking users usually work in local Multilingual (ML) environment (especially, in the case of e-mail) that is not Unicode-based and will remain such a long time. Current practice in ML MUAs uses internal Unicode representation of ML texts/messages providing a multiple charsets/languages conversion or mapping.

Importance of the support for internationalization (i18n) in all Internet services especially in WWW and e-mail is widely recognized by many international forum. Recently, the strategy for support of multilinguality in all Internet services was developed and adopted by IAB and invited experts, and the result was published as an Informational RFC (RFC 2130). The basic architectural model and the basic technologies are also adopted by many organizations: however the applications used over the Internet do not yet support the proposed models and services at satisfactory level. This is why the internationalization problems are still a major discussion topic in many working groups and Technical Committees , like ISO JTC 1 TC 22 WG 20, TC2, the TERENA WG-i18n [1] and WG-MSG [10], CEN/TC304 and CEN/ISSS [2]. A special project that addresses the implementation of internationalization standards in e-mail applications and the best of practice is also currently undertaken by the IMC (Internet Mail Consortium) [3].

Today, the basic standardization work for support of multilinguality in e-mail applications is fairly well established: The MIME (Multipurpose Internet Mail Extensions) set of standards gives clear specifications in order to use any language into message body part and in various header fields without any danger that the underlying e-mail network infrastructure will damage the content. The main information provided in the header field that prevents the damage in the transport is the information about the coded character set used and the transfer encoding scheme like base64 (see RFC 2045 - RFC 2049).

However, in practice e-mail user agents (MUAs) provide different levels of support of the multilinguality. There is also no standard scheme for assessment of the functionality and the level of compliance to the existing standards and recommendations. As a consequence, a common testing methodology (a Multilingual MUA benchmark) was not yet developed and is not yet available.

Usually, the multilingual operation of an MUA is tested by developers in a very simple-minded way, e.g. only the most simple instructions are evaluated like: type, edit, send, receive/view, reply and forward of messages. From the user point of view, very important operations which are usually performed when a user is working in multilingual environment remain untested. Typical operations that are missing in manufactures’ testing of the MUA software are:

message composition using clipboard operation (cut, paste, encrypt, translate, etc.)
use of languages and character sets in the Address and Subject fields that are different from the Message body part
document attachment.

If this context is considered, many of the user-friendly products offering rich and complex facilities appear to cause more troubles than simple ones. Users of such products usually need much more instructions and recommendations on how to setup their local/national language support and how to use it in a simple message. The problem becomes even more evident when the local community uses more than one character set or encoding for their languages (or alphabets). Last, but not the least, we should also remember that also many western-european languages, which are usually considered encodable without problems in US-ASCII, do instead require special support for some of their “special characters” in order to be spelled correctly.

2.0 Understanding multilingual support

One of the most relevant parameter that contributes to the existing confusion in the provision of multilingual content is the large number of existing international and proprietary standards: they complicate the content creation and viewing operations, and, as a consequence, also the development and testing of internationalized software for the Internet services.

Multilingual Internet products are not just Multilingual User Interfaces that allow typing and reading messages in selected languages. Multilingual support for example in Internet Mail Systems, assumes usage of a set of mail content safe techniques and standards that include the possibility to compose, transfer and read e-mail messages containing information in different languages, characters sets and encoding. In addition, they must support negotiation between MUAs and MTAs regarding the used/applied languages, character sets and encoding for each message being transferred. The Internet e-mail service is not interactive: it is a store and forward service. Therefore the multilingual negotiation here is different from the one usually used and applied in World Wide Web servers and clients model.

The essential method to verify the multilingual support by a set of MUAs and MTAs over the Internet e-mail service, consists in examining the processing of a test message composed by a set of codes representing the originator’s language and alphabet. When the test message is sent over a number of heterogeneous servers, if the multilingual support is correct, the recipient must be able to receive a readable message, identical to the one sent by the originator. The key for a correct delivery of a multilingual content in electronic mail is to ensure that the message composing software (content creation software) and the client's message reader are applying the same convention for mapping message’s codes to displayable characters (mapping bit combinations to glyphs).

The number of indigenous European languages according to CEN TC 304 is 160. The Internet literate European multilingual community uses more than 30 languages represented by many character sets with different repertoire and different encoding. A common property to all of them is the use of the character-box (or glyph-box) representation or single-byte character sets (SBCS), i.e., each character uses one displayable position. That makes a difference from the other languages used outside Europe.

Most of European languages use the Latin script, which consists of 26 base characters of the English alphabet (A through Z) in upper and lower case. Some additional characters are present in some European languages like French, Spanish or Icelandic, and many characters that are considered as characters composed from the basic ones and the diacritical marks specified in a few basic ISO standards (like ISO 6937). Fourteen diacritical marks, commonly called "accent marks" (this allows to support the nearly 200 diacritical combinations) completes the set for European Languages [4].

The repertoires of the official European languages of the members of the European Union (EU) are specified in ISO 8859-1, while the repertoires of Central and Eastern European languages using Latin alphabet are specified in ISO 8859-2. The Greek alphabet is specified in ISO 8859-7 and the Cyrillic alphabet used in Europe is specified in ISO 8859-5 [5]. The most widely used operating systems like UNIX and Microsoft Windows use their own developed character sets codes (like Windows Code Pages 1250 - 1258 or ANS) for support of the European Languages including Cyrillic languages (Russian, Ukrainian, Belorussian, Bulgarian, etc.) in CP1251 [6]. Standards de-facto for mail and news exchange as well as for WWW information in Russian and Ukrainian speaking communities are KOI8-R (RFC 1489) and KOI8-U (RFC 2319). These different character sets codes implemented in different operating systems are the main source of the incompatibility for the message content provided by MUAs running on these systems.

3.0 The TERENA MUAs multilingual support evaluation pilot project

The Pilot Project to evaluate the most popular Multilingual MUAs was launched at the beginning of 1998 by TERENA WG-MSG and WG-i18n [7]. The project major goal is to provide (within the TERENA technical program) its users community with consistent information and practical recommendations about the level and quality of the multilingual support in these widely used MUAs. The project officially started in April 1998 and is expected to deliver its results by September 1998. The project will specify, by mean of evaluation tests, the properties of each particular MUA and provide necessary recommendations and instructions for a correct configuration of particular MUAs.

The tests will be performed on a number of MUAs with the intention to determine their behavior when configured to work with different national character sets. The list of mail client to be tested was derived from TERENA MUAs usage statistics based on analysis of more than 3000 messages from TERENA Mail archives collected during the period August 1997 - March 1998. The list was approved by the TERENA Working Groups for Internationalization of the network services (WG-I18N) and the TERENA Working Group for Mail and Messaging (WG-MSG). An additional survey performed during the first three months in 1998 has shown that general distribution already known, and the list of used products was thus not changed. Only a number of newly released product were thus added to the test list. A summary of the collected statistics about MUAs use is given in Appendix A, while a more detailed information can be retrieved from the Pilot Project’s homepage [8].

The project plans to evaluate the MUAs in general but some evaluation of the local e-mail encoding practice will be also provided for some special cases with large users community. This applies for countries that use some special coded character sets for their languages e.g. in Russia, Ukraine and Belarus (for DOS, Windows, UNIX, Macintosh, etc.) and in some Central European Countries like Poland, Slovenia, Czech Republic, Hungary etc.

In addition to the project’s main goals another expected result of the project is to propose basic and common test scheme for Multilingual MUAs benchmarking. This part of the project will be based on the experiences collected by the test teams and the problems that have encountered while exploiting the evaluations. A special set of composite multilingual test messages will be also developed in collaboration with the test teams in several countries (e.g. test teams in Central European Countries, Russia and Ukraine) in order to allow the interoperability tests to be performed smoothly and easily.

The project activities are currently undertaken by an international team including representatives from Ukraine, Russia, Slovenia and Scandinavian countries. Other participants are welcome too.

The project objectives can be described as:

Develop benchmarking methodology for Multilingual MUAs, and specify templates for collecting the results in a coherent way.
Design a set of composite multilingual test messages to test multiple languages support in MUAs.
Configure each MUA for all supported national character sets and send the test messages to other MUAs and to themselves.
Compile the results, analyzing how the MUA composes, sends, receives and displays the test messages.
Prepare recommendations for users and define instructions for correct setup and operation of some popular multilingual MUAs in order to avoid incorrect delivery (“distortion”) of the sent messages.

The Project’s homepage was established at http://park.kiev.ua/multiling/ml-mua/

where all necessary information about the project and necessary collection of reference documents and standards are provided.

Results of the project will be made available on the TERENA WWW pages and presented at different Internet related conferences or user training workshops in cooperation with TERENA WG-ISUS and TERENA WG-MSG.

4.0 The testing and the evaluation scheme

A general model to verify the multilingual support in MUAs is presented on Fig.1. The model, developed inside the Pilot Project itself, presents in flow chart format the whole process envisaged for testing and evaluation, e.g. message composing, sending, receiving and reading. The various entries enable to set up the multilingual features and trace the possible locations of problems.

Fig. 1. General Model of Multilingual MUAs benchmarking. In particular, the testing of Multilingual support in MUAs includes the following phases:

Evaluation of Multilingual features/settings of MUAs
Testing Message Reading procedure
Testing Message Composing procedure
Testing Message Sending and Receiving procedure

More detailed descriptions of each phase is given in the next paragraphs.

4.1 Evaluation of Multilingual features/settings of MUAs

Typically, an MUA's Setting includes the following parameters and attributes for different operation modes:

READ operation mode

choose Language/Encoding
choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)

Optional - Font mapping

COMPOSE operation mode

choose Language/Encoding Settings

Optional - Possibility to switch Language/Encoding during composition/typing and for different sections in the message

choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)
Optional - choose Spelling/Language/Dictionary

SEND operation mode

set MIME encoding (Quoted Printable, Base64)
select/disable Uuencode mode (non standard)
Allow/disallow 8-bit in Header Fields
select/disable HTML in bodyparts

4.2 Message Reading procedure

Testing multilingual support of MUAs usually starts with reading the received messages or merged mailbox content. When reading messages, multilingual MUAs should support the following features:

Reading/Displaying non-ASCII characters in Message Body
Reading/Displaying non-ASCII characters in Message Header (Address, Subject Lines)
Reading Forwarded Message with non-ASCII characters in Address, Subject, Message Body, using the same or different MIME character set attributes
Reading Attached non-ASCII Text File (Document)

Possible problems are detected comparing the original and the delivered test messages appearance; this includes the evaluation of the MUAs correct/incorrect processing of the MIME attributes of the test message.

4.3 Message Composing procedure

Message composition in real life requires an active interoperation with the whole user’s working environment. In fact it includes operations like:

Typing non-ASCII text in Message Body
Typing non-ASCII text in Message Header (Address and Subject Line)
Reply to message with non-ASCII Text
Pasting non-ASCII-Text into Body and Header fields
Forward message with non-ASCII content
Attach text documents containing non-ASCII characters

4.4 Sending/Receiving Messages procedures

These are the common source of problems in sending and receiving messages in multilingual environment:

Exchange messages with an identical MUA (both self carbon copy and a remote identical MUA)
Exchange messages with different MUAs (local ones, thus using the same MTA and remote ones, where also different MTAs are usually involved)
External exchange of messages with multilingual content in the message
Crossing Message Transport System (MTS) Gateways (like gateways to LAN mail systems and in general to non-SMTP transport).

5.0 Test Messages Set

Each test will be performed in at least 2 character sets, one of which is US ASCII (or ISO 8859-1), and the other with characters that are not part of US-ASCII or ISO 8859-1. Optionally, some tests for support of multibytes Character Sets (UCS ISO 10646, ISO CJK - China, Japan, Korean) will be provided for MUAs that support these types of encoding.

Additional tests will be provided for UTF-8/UTF-7 support and, particularly, to verify the correctness of their internal transfer scheme between Unicode and non-Unicode character sets.

The test messages set is composed by:

Mandatory tmsg1 - Message with non-ASCII characters/text in the Subject line
tmsg2 - Message with non-ASCII characters/text in Mail Address free-form name
tmsg3 - Message with non-ASCII characters/text in the Message Body text (single part)
tmsg4 - Message with non-ASCII characters/text in text/plain attachment

Optionally tmsg5* - Message with ASCII and non-ASCII characters/text with non-Western language/Encoding Default setting of MUA (optional)
tmsg6* - Message with UTF-7/UTF-8 Character set in Message Body and Header (optional)

The following shortcuts are also envisaged:

Test Messages tmsg1-tmsg2-tmsg3 could be combined in tmsg123 (if no interference in MUA is detected)
MUAs will be tested for main (if possible - All) character sets and character set/encoding supported by particular MUA and for main groups of European languages (Western - Default, Central European, Scandinavian, Cyrillic)
The choice of Characters Set is up to the tester, but messages tmsg3 and tmsg4 should include the full character repertoire in the body, if possible.
tmsg4 should be composed of attached documents and optionally of Attached/Forwarded message with non-ASCII characters/text
Some additional tests should be provided if their needs will be identified during MUAs’ Multilingual support examination.

The set of test messages will be available at the project webpage [8]. Also the pilot version of the online test message constructor/generator will be available at the MUA test page [9]. Developing a test message constructor will easy the customized preparation of specific test messages. In fact it will easily let the user choose among a number of options (alphabet, keyboard mapping or some quoted text) and then send the composed test message to the user’s e-mail address for verification. More over the test message constructor/generator will create new messages via the provided “webtypewriter”, where it will be possible to choose the national keyboard image mapping: this latter tool will allow online testers to type short text in their own languages. All the data will be processed by server based CGI program and sent to the tester via the normal MTS.

Another source of multiple languages texts will be taken from some regional servers of some international software producers:

Microsoft http://www.microsoft.com
Alis Technologies http://www.alis.com

6.0 Testing Methodology

The basic testing scheme includes the MUAs under test and all the tools to generate/check the test message, i.e. the message constructor/inspector tool, the tools to type in test messages using keyboard input or using Copy&Paste functions or the special test message generator tool, in order to enable the use of some special test messages set (see Fig.1).

The tests required to be performed are:

test-1 - Receive all 4 test messages tmsg1-tmsg4 and display them correctly (Change Mail Reader Language/Alphabet/Encoding Options if needed)

test-2 - Print all 4 messages tmsg1-tmsg4 to the standard printer

test-3 - Reply to messages tmsg1 and tmsg2, and check that information is returned in the same character set as it arrived in

test-4 - Reply to message tmsg3 using the IUT's "reply including quote of body"

test-5 - Reply to message tmsg3 using the environment's "cut and paste" function to insert the non-ASCII characters into the outgoing message

test-6 - Forward all 4 messages to the originator address

test-7 - Generate, as completely as possible, the same messages from the keyboard of the IUT

test-8* - Check possible text distortion when exchanging by message tmsg5* with different (non-ASCII) Default Language/Alphabet/Encoding setting

test-9* - Provide tests 1-5 for message tmsg6* with UTF-7/UTF-8 Character set

For each test, the possible results are expressed as "pass/fail/maybe", providing the possible explanation to the problems discovered. The final results of testing group of MUAs will be provided jointly with recommendations about how to use properly with the evaluated MUAs, i.e. how to correctly set its parameters, etc.

7.0 Conclusion

The international environment of the project and its wide geographical range will allow to discover the main problems in multilingual MUAs current support, also where these problems arise due to the use of different languages, cultural contents and different usage practice within the European multilingual community. Notwithstanding the expected wide implementation of Unicode everywhere, there are still problems which usually show up clearly when working in multiple languages and character sets environment: thus, from the user’s point of view there are still many issues which require solutions for the benefit of the community.

The TERENA evaluation project will provide a detailed set of testing results, a set of multilingual test messages and a recommended testbed for the forthcoming MUAs that will help developers and users in their daily work. Another interesting off-line product of the project will be creation of the online MUA’s Test Tools. The current version of these test tools is located at the following URL

http://park.kiev.ua/multiling/ml-mua/testcon.html

Acknowledgment

We wish to express our special acknowledgments to Peter Heijmens Visser from TERENA who provided MUAs usage statistics based on detailed analysis of TERENA Mail archive for the period preceding Project realization. More over, all our appreciation goes to Harald T. Alvestrand, from Maxware Norway, for its original idea about the project itself, and for its precious support during the definition of the project details.

8.0 References

TERENA WG-i18n. - http://www.terena.nl/working-groups/wg-i18n/
CEN/TC304 Character set technology. European Localization Requirements. - http://www.stri.is/TC304/default.html
Internet Mail Consortium. - http://www.imc.org/imc-intl/
Developing International Software For Windows 95 and Windows NT, by Nadine Kano, 1995, published by Microsoft Press (ISBN 1-55615-840-8).
The ISO 8859 Character Sets. - http://park.kiev.ua/multiling/ml-docs/iso-8859.html
Character Set Recognition. - http://www.microsoft.com/msdn/sdk/inetsdk/help/dhtml/references/charsets/charset4.htm
TERENA WG-I18N / WG-MSG Pilot Project Proposal "Multilingual Email Clients - A Test". - http://www.terena.nl/projects/proposed/proposal1.htm
Multilingual Mail Users Agents. TERENA Pilot Project Homepage. - http://park.kiev.ua/multiling/ml-mua/.
Multilingual Mail Users Agents. Test Page. - http://park.kiev.ua/multiling/ml-mua/mlmua-test.html
TERENA WG-MSG. - http://www.terena.nl/working-groups/wg-msg/

Appendix A

List of MUAs to be tested

Based on MUAs usage statistics in TERENA community for sample of more than 3000 messages.

Updated version is available at http://park.kiev.ua/multiling/ml-mua/mua-statist.html

MS Windows UNIX

MUA Number of usage per 1000 Number of usage per 1000

Netscape Mozilla (All Versions) 201 ELM 67

Windows Eudora 165 pine 40

MS Mailers (All Versions) 78 exmh 20

MS OutLook

14 Netscape Mail for UNIX 13

MS Exchange

31 Z-Mail

Pegasus 84

The Bat! by RIT Research Labs.

Additional list

Simeon 17

Forte Agent 28

Alis Tango Mailer

MS Windows		UNIX
MUA	Number of usage per 1000		Number of usage per 1000
Netscape Mozilla (All Versions)	201	ELM	67
Windows Eudora	165	pine	40
MS Mailers (All Versions)	78	exmh	20
MS OutLook	14	Netscape Mail for UNIX	13
MS Exchange	31	Z-Mail
Pegasus	84
The Bat! by RIT Research Labs.
Additional list
Simeon	17
Forte Agent	28
Alis Tango Mailer