ANNEX 7

ATLANTIC SUBSET
OF THE INTERNATIONAL STANDARD
ISO/IEC 10646 UNIVERSAL MULTIPLE-OCTET CODED CHARACTER SET

1994-09-15
corrected 1998-08-04
J. W. van Wingen

NOTE: The following text presents what could have been an Atlantic Standard, if such a thing would exist. Thus it has no formal status at all. Nevertheless, it contains everything a serious user would need to know, when he wants to use ISO/IEC 10646-1 for applications restricted to the Atlantic part of the world, without spending $400 on the complete thing.

Part 1: General structure and Latin script

INTRODUCTION

It is commonly understood that the whole of the repertoire of ISO/IEC 10646 Universal Multiple-octet Coded Character Set, is not a firm requirement to large groups of users of European languages on both sides of the Atlantic. In order to present guidance to manufacturers and users that they may avoid doing their own selection a subset is defined specifying the coding of those characters having been identified as the total character repertoire needed for European languages. A larger subset than the minimum set specified here may be needed for special applications, but any extensions are not prohibited. Some recommendations are given in an Annex, for sets of characters needed with identified applications.

The text of this Atlantic Standard (AS) is based on that of ISO/IEC 10646-1:1993 where possible, after removal of everything that is not relevant to the Atlantic situation. It makes this AS a self-contained document which does not require from the reader, if he is interested only in characters used in European languages, any consultation of ISO/IEC 10646-1:1993. On the other hand, if the reader wants to understand the principles of multi-octet coding, and the way these are applied to any script, the study of ISO/IEC 10646-1:1993 is an absolute requirement, in particular where information on a transformation format (UTF-8), retransmission, octet value representation notations, character naming guidelines, is wanted (presented in its Annexes G, H, J, K, R). Thus this AS does not replace the ISO/IEC standard, not even where European scripts are in exclusive use. The text as presented just states what is needed with European languages to specify the coding of the characters contained in this AS, and to indicate requirements to conforming equipment or other character supporting product at procurement. Should, despite the great care taken in the preparation of this document, the text of this AS lead to an interpretation or to a conclusion different from that reached from reading ISO/IEC 10646-1:1993, then that from the latter will prevail.

The numbers of the original clauses of ISO/IEC 10646-1 are given between parentheses behind the number in the heading of the corresponding clause of this Subset, to facilitate comparison. Further reference to this AS will be made as to "this Subset".

Only two-octet coding is used for characters in this Subset.

No levels of implementation are specified.

11 (1) SCOPE

This Atlantic Standard specifies a subset of ISO/IEC 10646, Universal Multiple-octet Coded Character Set, required for coding the character repertoire in modern use of the listed European languages, written with Latin, Greek or Cyrillic script.

Covered are:

Official languages using Latin script:
 
Albanian 
Croat 
Czech 
Danish 
Dutch 
English 
Estonian 
Finnish 
French 
German 
Hungarian 
Icelandic 
Irish 
Italian
Latvian
Lithuanian
Luxemburgish
Maltese
Norwegian
Polish
Portuguese
Romanian
Slovak
Slovenian
Spanish
Swedish
Turkish

Official languages using Greek script:

Greek
Official languages using Cyrillic script:
Bulgarian
Byelorussian
Macedonian
Russian
Serbian
Ukrainian
Regional languages using Latin script:
Basque (France, Spain)
Breton (France)
Catalan (France, Spain, Andorra)
Faroese (Denmark)
Frisian (Netherlands)
Gaelic (UK)
Galician (Spain)
Greenlandic (Denmark)
Rumantsch (Switzerland)
Sami (Norway, Sweden, Finland)
Sorbian (Germany)
Welsh (UK)
The coding of the repertoires for Afrikaans and Esperanto is included in some normative tables for compatibility with the repertoire of ISO/IEC 6937:1994.

Part 1 of this AS covers Latin script, other parts will specify the coding of Greek and Cyrillic script.

The coding method used is that of two-octet form (UCS-2), because all required characters fit in the Basic Multilingual Plane (BMP) that is specified in ISO/IEC 10646-1:1993. No difference exists between any coding in this Subset and the UCS-2 coding of the same character.

A number of subrepertoires of this Subset is indicated, identified with a name, to enable the user to state his requirements in terms of options.

Information on the coding of some characters outside the repertoire specified in ISO/IEC 10367 for use in special applications required for restricted groups of European users is presented in a normative Annex.

2 (2) CONFORMANCE

2.1 Conformance of information interchange

A coded-character-data-element (CC-data-element) within coded information for interchange is in conformance with this Subset if all the coded representations of characters within that CC-data-element conform to the requirements of clause 6.

A claim of conformance shall identify whether the European Latin, the European Greek, the European Cyrillic or the European Special Character Repertoire, or any other subrepertoire of this Subset specified in this ENV, or a combination of these, is adopted.

2.2 Conformance of devices

A device is in conformance with this Subset if it conforms to the requirements of 2.2.1, and either or both of 2.2.2 and 2.2.3.

A claim of conformance shall identify the document which contains the description specified in 2.2.1, and shall identify whether the European Latin, the European Greek, the European Cyrillic or the European Special Character Repertoire, or any other subrepertoire of this Subset specified in this ENV, or a combination of these, is adopted.

2.2.1 Device description

A device that conforms to this Subset shall be the subject of a description that identifies the means by which the user may supply characters to the device, or may recognize them when they are made available to him, as specified respectively in 2.2.2 and 2.2.3.

2.2.2 Originating devices

An originating device shall allow its user to supply any sequence of characters from the repertoire adopted, and shall be capable of transmitting their coded representations within a CC-data-element.

2.2.3 Receiving devices

A receiving device shall be capable of receiving and interpreting any coded representations of characters that are within a CC-data-element, and that conform to 2.1, and shall make the corresponding characters available to its user in such a way that the user can identify them from among those of the repertoire adopted, and can distinguish them from each other.

13 (3) NORMATIVE REFERENCES

The following standards contain provisions which, through reference in this text, constitute provisions of this Atlantic Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Atlantic Standard are encouraged to investigate the possibility of applying the most recent editions of the standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards.

ISO/IEC 2022:1994 Information processing - 7-bit and 8-bit coded character sets - Code extension techniques.

ISO/IEC 6429:1993 Information processing - Control Functions.

ISO/IEC 10367:1991 Information processing - Standardized coded graphic character sets for use in 8-bit codes.

ISO/IEC 10646-1:1993 Information processing - Universal Multiple-octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane.

4 (4) DEFINITIONS

The numbers of definitions are those given in ISO/IEC 10646-1:1993.

Where necessitated by the scope of this Subset definitions have been changed to avoid referring to features not included.

4.4 coded-character-data-element (CC-data-element): An element of interchanged information that is specified to consist of a sequence of coded representations of characters, in accordance with one or more identified standards for coded character sets.

4.5 cell : The place within a row at which an individual character may be allocated.
4.6 character : A member of a set of elements used for the organization, control or representation of data.
4.8 coded character : A character together with its coded representation.
4.9 coded character set; code : A set of unambiguous rules that establishes a character set and the relationship between the characters of the set and their coded representation.
4.10 code table : A table showing the characters allocated to the octets in a code.
4.14 control function : An action that affects the recording, processing, transmission or interpretation of data, and that has a coded representation consisting of one or more octets.
4.17 device: A component of information processing equipment which can transmit, and/or receive, coded information within CC-data-elements.
4.18 graphic character : A character, other than a control function, that has a visual representation normally handwritten, printed or displayed.
4.19 graphic symbol : A visual representation of a graphic character or of a control function.
4.23 octet : An ordered sequence of eight bits considered as a unit.
4.24 plane : the coding space of this Subset; of 256 rows.
4.28 repertoire : A specified set of characters that are represented by means of one or more bit combinations of a coded character set.
4.29 row : A subdivision of a plane; of 256 cells.
4.30 script : A set of graphic characters used for the written form of one or more languages.
4.32 user: A person or other entity that invokes the services provided by a device.

15 (5) THE UNIVERSAL MULTIPLE-OCTET CODED CHARACTER SET GENERAL STRUCTURE

The general structure of the Universal Multiple-Octet Coded Character Set, of which this Subset is a proper subset, is described in this explanatory clause. The normative specification of the structure is given in later clauses, (this is indicated by the use of the term "shall").

The value of any octet is expressed in hexadecimal notation from 00 to FF in ISO/IEC 10646.

The canonical form of UCS uses a four-dimensional coding space, regarded as a single entity, consisting of 256 * 256 planes. Each plane consists of 256 one-dimensional rows, each row consisting of 256 cells. A character is located and coded at a cell within this coding space or the cell is declared unused.

In the canonical form, four octets are used to represent each character. The first plane, having 00 00 as its first two octets, is called the Basic Multilingual Plane (BMP).

In addition to the canonical form, a two-octet BMP is specified. This BMP can be used as a two-octet coded character set identified as UCS-2.

Subsets of the coding space may be used to give a sub-repertoire of graphic characters. The Atlantic Subset specifies a selection of the coded characters of UCS-2, using two-octet coding only.

6 (6) CODING OF CHARACTERS

In the UCS-2, and thus in this Subset, each character shall be represented by a sequence of two octets. The most significant octet of this sequence shall be the row-octet. The least significant octet of this sequence shall be the cell-octet. Terming the octets for brevity as R-octet and C-octet, this sequence may be represented as

most-significant least-significant

R-octet C-octet

The value of any octet shall be represented by two hexadecimal digits, for examples: 31 or FE. When a single character is to be identified in terms of the values of its row and cell, this shall be represented such as

0031 for DIGIT ZERO
0041 for LATIN CAPITAL LETTER A

Within each octet the most significant bit shall be bit 8 and the least significant bit shall be bit 1. Accordingly, the weight allocated to each bit shall be

high order bits low order bits

bit: b8 b7 b6 b5 b4 b3 b2 b1
weight: 128 64 32 16 8 4 2 1

The sequence of the octets that represent a character, and the most significant and least significant ends of it, shall be maintained as shown above. When not serialized as octets, a more significant octet shall precede less significant octets. When not serialized as octets, the order of octets may be specified by agreement between sender and recipient.

17 (7) SPECIAL FEATURES OF THIS SUBSET
 


8 (13) NATURE OF THIS SUBSET

ISO/IEC 10646 provides the specification of subsets of coded graphic characters for use in interchange, by originating devices and by receiving devices.

This Subset presents a "limited" subset in the sense defined in subclause 13.1 of ISO/IEC 10646-1:1993, by consisting of a list of graphic characters in the specified subset. It contains no reference to any of the collections that are listed in Annex A of that International Standard, like there are LATIN-1 SUPPLEMENT, LATIN EXTENDED-A, LATIN EXTENDED-B or EXTENDED ADDITIONAL. Many of these collections contain characters not in the repertoire of this Subset.

9 (14) CODED REPRESENTATION FORM OF THIS SUBSET

This Subset provides only a single form, that of characters from the European repertoire with each character represented by two octets.

Within a CC-data-element conforming to the requirements of this Subset a character from the repertoire of this Subset shall be represented by two octets comprising the R-octet and the C-octet as specified in clause 6.

10 (15) IMPLEMENTATION LEVELS

This Subset does not specify implementation levels.

11 (16) USE OF CONTROL FUNCTIONS WITH THIS SUBSET

This Subset provides for use of control functions encoded according to ISO 2022, ISO/IEC 6429 or similarly structured standards for control functions, and standards derived from these. A set or subset of such control functions may be used in conjunction with this coded character set. These standards encode a control function as a sequence of one or more octets.

When a C0 control function of ISO/IEC 6429 is used with this coded character set, its coded representation as specified in ISO/IEC 6429 shall be padded to correspond with the number of octets adopted in this Subset. Thus, the least significant octet shall be the bit combination specified in ISO/IEC 6429, and the more significant octet shall consist of zeros only.

For example, the control function FORM FEED is represented by "000C" in this Subset.

For escape sequences, control sequences, and control strings (see ISO/IEC 6429) consisting of a coded control function consisting of a single bit combination, followed by additional bit combinations in the range 20 to 7F, each bit combination shall be padded by an octet with value 00.

For example, the escape sequence "ESC 02/00 04/00" is represented by "001B 0020 0040".

When using a C1 control function of ISO/IEC 6429 with this coded character set, it shall be coded as ESC Fe sequence (see ISO/IEC 6429) padded as specified above.

For example, the control function PARTIAL LINE BACKWARD - PLU (08/12 in ISO/IEC 6429 representation) is represented by "001B 004C".

Code extension control functions for the ISO 2022 code extension techniques (such as designation escape sequence, single shift and locking shift) shall not be used with this coded character set.

12 (17) DECLARATION OF IDENTIFICATION OF FEATURES

12.1 Purpose and context of identification

CC-data-elements conforming to ISO/IEC 10646 are intended to form all part of a composite unit of coded information that is interchanged between an originator and a recipient. The identification of ISO/IEC 10646, this Subset, or any subset of it, that have been adopted by the originator must also be available to the recipient. The route by which such identification is communicated to the recipient is outside the scope of ISO/IEC 10646 and this Subset.

However, some standards for interchange of coded information may permit, or require, that the coded representation of the identification applicable to the CC-data-element forms a part of the interchanged information. Such coded representations provide all or part of an identification data element, which may be included in information interchange in accordance with the relevant standard.

12.2 Specification of identification

The coded representation for the identification of this Subset, or of any of its subrepertoires, or of a control function set used with any of those, is specified in another Atlantic Standard (in preparation).

113 (18) STRUCTURE OF THE CODE TABLES AND LISTS

Clause 14 (25) sets out the detailed code tables and the list of character names for the graphic characters, their coded representation, and the character name for each character.

The graphic symbols are to be regarded as typical visual representations of the characters. ISO/IEC 10646 does not attempt (nor does this Subset) to prescribe the exact shape of each character. The shape is affected by the design of the font employed, which is outside the scope of ISO/IEC 10646.

Graphic characters specified in ISO/IEC 10646 are uniquely identified by their names. This does not imply that the graphic symbols by which they are commonly imaged are always different. Examples of graphic characters with similar graphic symbols are LATIN CAPITAL LETTER A, GREEK CAPITAL LETTER ALPHA, and CYRILLIC CAPITAL LETTER A.

The meaning attributed to any character is not specified by ISO/IEC 10646; it may differ from country to country, or from one application to another.

14 (25) CODE TABLES AND LISTS OF CHARACTER NAMES

The coding of a character shall be as specified in the tables of this clause. The characters included in these tables constitute the Atlantic Subset of ISO/IEC 10646.

14.1 The characters and their coding required for the European Latin repertoire of letters and digits are specified in Table 1.

14.2 The characters and their coding required for the European Special Characters repertoire are specified in Table 2.

NOTE:
These repertoires taken together (the European Latin subrepertoire) contain that of ISO/IEC 6937:1994 as a proper subrepertoire, the remaining characters being those needed for Welsh. A claim stating that the ISO/IEC 6937:1994 repertoire is covered may be formulated as of covering the European Latin subrepertoire without Full Welsh, or by the Latin-Telematic subrepertoire (see Table 1).

14.3 The characters and their coding required for the European Greek repertoire of letters and special characters are specified in Part 2 of this AS.

14.4 The characters and their coding required for the European Cyrillic repertoire of letters and special characters are specified in Part 3 of this AS.

14.5 The characters and their coding required for the European repertoire of box drawing characters (as included in ISO/IEC 10367) are specified in Part 4 of this AS.

14.6 The relation between the Atlantic Subset and its subrepertoires may be illustrated by the following scheme:


TABLE 1

VERSION 2.1
1995-02-15, correct. 1998-08-04
J. W. van Wingen

COMPLETE REPERTOIRE OF LETTERS AND DIGITS REQUIRED FOR LATIN WRITTEN EUROPEAN LANGUAGES

Grouped to Short Identifier (SID)
Transformation to ASCII in first column,
SGML public entities in 2nd column
Indication in columns 63-72:
Table in ISO 10367 where the character is included:
 
(Table 1/2) Basic G0 Set (as of ISO 4873)
(Table 3/4) Latin Alphabet No. 1 (as of ISO 8859-1)
(Table 5/6) Latin Alphabet No. 2 (as of ISO 8859-2)
(Table 7/8) Latin Alphabet No. 3 (as of ISO 8859-3)
(Table 9/10) Latin Alphabet No. 4 (as of ISO 8859-4)
(Table 11/12) Latin Alphabet No. 5 (as of ISO 8859-9)
(Table 21/22) Supplementary Set for Latin Alphabets
C/X  (Table A.1/2) ISO 6937, Supplementary Set (C) or Repertoire only (X)
Used in Teletex (CCITT T.61)
Used in Videotex (CCITT T.101)

Code in ISO 10646 indicated as double bytes in hexadecimal notation.

Named subrepertoires:

BASIC LATIN : requires all characters of this table marked with 0.
LATIN-1 : requires all characters of this table marked with 1.
LATIN-TELEMATIC: requires all characters of this table.
Note: These subrepertoires also require characters from Table 2.
 
Transformation to ASCII SGML public entries Short identifiers (SID) Letters name Tables in ISO 10367 ISO 10646 binary codes
LA01  LATIN SMALL LETTER A  0.......TV  0061
LA02  LATIN CAPITAL LETTER A  0.......TV  0041
LB01  LATIN SMALL LETTER B  0.......TV  0062
LB02  LATIN CAPITAL LETTER B  0.......TV  0042
LC01  LATIN SMALL LETTER C  0.......TV  0063
LC02  LATIN CAPITAL LETTER C  0.......TV  0043
LD01  LATIN SMALL LETTER D  0.......TV  0064
LD02  LATIN CAPITAL LETTER D  0.......TV  0044
LE01  LATIN SMALL LETTER E  0.......TV  0065
LE02  LATIN CAPITAL LETTER E  0.......TV  0045
LF01  LATIN SMALL LETTER F  0.......TV  0066
LF02  LATIN CAPITAL LETTER F  0.......TV  0046
LG01  LATIN SMALL LETTER G  0.......TV  0067
LG02  LATIN CAPITAL LETTER G  0.......TV  0047
LH01  LATIN SMALL LETTER H  0.......TV  0068
LH02  LATIN CAPITAL LETTER H  0.......TV  0048
LI01  LATIN SMALL LETTER I  0.......TV  0069
LI02  LATIN CAPITAL LETTER I  0.......TV  0049
LJ01  LATIN SMALL LETTER J  0.......TV  006A
LJ02  LATIN CAPITAL LETTER J  0.......TV  004A
LK01  LATIN SMALL LETTER K  0.......TV  006B
LK02  LATIN CAPITAL LETTER K  0.......TV  004B
LL01  LATIN SMALL LETTER L  0.......TV  006C
LL02  LATIN CAPITAL LETTER L  0.......TV  004C
LM01  LATIN SMALL LETTER M  0.......TV  006D
LM02  LATIN CAPITAL LETTER M  0.......TV  004D
LN01  LATIN SMALL LETTER N  0.......TV  006E
LN02  LATIN CAPITAL LETTER N  0.......TV  004E
LO01  LATIN SMALL LETTER O  0.......TV  006F
LO02  LATIN CAPITAL LETTER O  0.......TV  004F
LP01  LATIN SMALL LETTER P  0.......TV  0070
LP02  LATIN CAPITAL LETTER P  0.......TV  0050
LQ01  LATIN SMALL LETTER Q  0.......TV  0071
LQ02  LATIN CAPITAL LETTER Q  0.......TV  0051
LR01  LATIN SMALL LETTER R  0.......TV  0072
LR02  LATIN CAPITAL LETTER R  0.......TV  0052
LS01  LATIN SMALL LETTER S  0.......TV  0073
LS02  LATIN CAPITAL LETTER S  0.......TV  0053
LT01  LATIN SMALL LETTER T  0.......TV  0074
LT02  LATIN CAPITAL LETTER T  0.......TV  0054
LU01  LATIN SMALL LETTER U  0.......TV  0075
LU02  LATIN CAPITAL LETTER U  0.......TV  0055
LV01  LATIN SMALL LETTER V  0.......TV  0076
LV02 LATIN CAPITAL LETTER V  0.......TV  0056
LW01  LATIN SMALL LETTER W  0.......TV  0077
LW02  LATIN CAPITAL LETTER W  0.......TV  0057
LX01  LATIN SMALL LETTER X  0.......TV  0078
LX02  LATIN CAPITAL LETTER X  0.......TV  0058
LY01  LATIN SMALL LETTER Y  0.......TV  0079
LY02  LATIN CAPITAL LETTER Y  0.......TV  0059
LZ01  LATIN SMALL LETTER Z  0.......TV  007A
LZ02  LATIN CAPITAL LETTER Z  0.......TV  005A
/a  &aacute  LA11  LATIN SMALL LETTER A WITH ACUTE  .12345.XTV  00E1
/A  &Aacute  LA12  LATIN CAPITAL LETTER A WITH ACUTE  .12345.XTV  00C1
/c  &cacute  LC11  LATIN SMALL LETTER C WITH ACUTE  ..2....XTV  0107
/C  &Cacute  LC12  LATIN CAPITAL LETTER C WITH ACUTE  ..2....XTV  0106
/e  &eacute  LE11  LATIN SMALL LETTER E WITH ACUTE  .12345.XTV  00E9
/E  &Eacute LE12  LATIN CAPITAL LETTER E WITH ACUTE  .12345.XTV  00C9
/i  &iacute  LI11  LATIN SMALL LETTER I WITH ACUTE  .12345.XTV  00ED
/I  &Iacute  LI12  LATIN CAPITAL LETTER I WITH ACUTE  .12345.XTV  00CD
/l  &lacute  LL11  LATIN SMALL LETTER L WITH ACUTE  ..2....XTV  013A
/L  &Lacute  LL12  LATIN CAPITAL LETTER L WITH ACUTE  ..2....XTV  0139
/n  &nacute  LN11  LATIN SMALL LETTER N WITH ACUTE  ..2....XTV  0144
/N  &Nacute  LN12  LATIN CAPITAL LETTER N WITH ACUTE  ..2....XTV  0143
/o  &oacute  LO11  LATIN SMALL LETTER O WITH ACUTE  .123.5.XTV  00F3
/O  &Oacute  LO12  LATIN CAPITAL LETTER O WITH ACUTE  .123.5.XTV  00D3
/r  &racute  LR11  LATIN SMALL LETTER R WITH ACUTE  ..2....XTV  0155
/R  &Racute  LR12  LATIN CAPITAL LETTER R WITH ACUTE  ..2....XTV  0154
/s  &sacute  LS11  LATIN SMALL LETTER S WITH ACUTE  ..2....XTV  015B
/S  &Sacute  LS12  LATIN CAPITAL LETTER S WITH ACUTE  ..2....XTV  015A
/u &uacute  LU11  LATIN SMALL LETTER U WITH ACUTE  .12345.XTV  00FA
/U  &Uacute  LU12  LATIN CAPITAL LETTER U WITH ACUTE  .12345.XTV  00DA
/w  &wacute  LW11  LATIN SMALL LETTER W WITH ACUTE  * ..........  1E83
/W  &Wacute  LW12  LATIN CAPITAL LETTER W WITH ACUTE  * ..........  1E82
/y  &yacute  LY11  LATIN SMALL LETTER Y WITH ACUTE  .12...AXTV  00FD
/Y  &Yacute  LY12  LATIN CAPITAL LETTER Y WITH ACUTE  .12...AXTV  00DD
/z  &zacute  LZ11  LATIN SMALL LETTER Z WITH ACUTE  ..2....XTV  017A
/Z  &Zacute  LZ12  LATIN CAPITAL LETTER Z WITH ACUTE  ..2....XTV  0179

\a  &agrave  LA13  LATIN SMALL LETTER A WITH GRAVE  .1.3.5.XTV  00E0
\A  &Agrave  LA14  LATIN CAPITAL LETTER A WITH GRAVE  .1.3.5.XTV  00C0
\e  &egrave  LE13  LATIN SMALL LETTER E WITH GRAVE  .1.3.5.XTV  00E8
\E  &Egrave  LE14  LATIN CAPITAL LETTER E WITH GRAVE  .1.3.5.XTV  00C8
\i  &igrave  LI13  LATIN SMALL LETTER I WITH GRAVE  .1.3.5.XTV 00EC
\I  &Igrave  LI14  LATIN CAPITAL LETTER I WITH GRAVE  .1.3.5.XTV  00CC
\o  &ograve  LO13  LATIN SMALL LETTER O WITH GRAVE  .1.3.5.XTV  00F2
\O  &Ograve  LO14  LATIN CAPITAL LETTER O WITH GRAVE  .1.3.5.XTV  00D2
\u  &ugrave  LU13  LATIN SMALL LETTER U WITH GRAVE  .1.3.5.XTV  00F9
\U  &Ugrave  LU14  LATIN CAPITAL LETTER U WITH GRAVE  .1.3.5.XTV  00D9
\w  &wgrave  LW13  LATIN SMALL LETTER W WITH GRAVE  * ..........  1E81
\W  &Wgrave  LW14  LATIN CAPITAL LETTER W WITH GRAVE  * ..........  1E80
\y  &ygrave  LY13  LATIN SMALL LETTER Y WITH GRAVE  * ..........  1EF3
\Y  &Ygrave  LY14  LATIN CAPITAL LETTER Y WITH GRAVE  * ..........  1EF2

>a  &acirc  LA15  LATIN SMALL LETTER A WITH CIRCUMFLEX  .12345.XTV  00E2
>A  &Acirc  LA16  LATIN CAPITAL LETTER A WITH CIRCUMFLEX  .12345.XTV  00C2
>c  &ccirc  LC15  LATIN SMALL LETTER C WITH CIRCUMFLEX  ...3..AXTV  0109
>C  &Ccirc  LC16  LATIN CAPITAL LETTER C WITH CIRCUMFLEX  ...3..AXTV  0108
>e  &ecirc  LE15  LATIN SMALL LETTER E WITH CIRCUMFLEX  .1.3.5.XTV  00EA
>E  &Ecirc  LE16  LATIN CAPITAL LETTER E WITH CIRCUMFLEX  .1.3.5.XTV  00CA
>g  &gcirc  LG15  LATIN SMALL LETTER G WITH CIRCUMFLEX  ...3..AXTV  011D
>G  &Gcirc  LG16  LATIN CAPITAL LETTER G WITH CIRCUMFLEX  ...3..AXTV  011C
>h  &hcirc  LH15  LATIN SMALL LETTER H WITH CIRCUMFLEX  ...3..AXTV  0125
>H  &Hcirc  LH16  LATIN CAPITAL LETTER H WITH CIRCUMFLEX  ...3..AXTV  0124
>i  &icirc  LI15  LATIN SMALL LETTER I WITH CIRCUMFLEX  .12345.XTV  00EE
>I  &Icirc  LI16  LATIN CAPITAL LETTER I WITH CIRCUMFLEX  .12345.XTV  00CE
>j  &jcirc  LJ15  LATIN SMALL LETTER J WITH CIRCUMFLEX  ...3..AXTV  0135
>J  &Jcirc  LJ16  LATIN CAPITAL LETTER J WITH CIRCUMFLEX  ...3..AXTV  0134
>o  &ocirc  LO15  LATIN SMALL LETTER O WITH CIRCUMFLEX  .12345.XTV  00F4
>O  &Ocirc  LO16  LATIN CAPITAL LETTER O WITH CIRCUMFLEX  .12345.XTV  00D4
>s &scirc  LS15  LATIN SMALL LETTER S WITH CIRCUMFLEX  ...3..AXTV  015D
>S  &Scirc  LS16  LATIN CAPITAL LETTER S WITH CIRCUMFLEX  ...3..AXTV  015C
>u  &ucirc  LU15  LATIN SMALL LETTER U WITH CIRCUMFLEX  .1.345.XTV  00FB
>U  &Ucirc  LU16  LATIN CAPITAL LETTER U WITH CIRCUMFLEX  .1.345.XTV  00DB
>w  &wcirc  LW15  LATIN SMALL LETTER W WITH CIRCUMFLEX  ......AXTV  0175
>W  &Wcirc  LW16  LATIN CAPITAL LETTER W WITH CIRCUMFLEX  ......AXTV  0174
>y  &ycirc  LY15  LATIN SMALL LETTER Y WITH CIRCUMFLEX  ......AXTV  0177
>Y  &Ycirc  LY16  LATIN CAPITAL LETTER Y WITH CIRCUMFLEX  ......AXTV  0176

%a  &auml  LA17  LATIN SMALL LETTER A WITH DIAERESIS  .12345.XTV  00E4
%A  &Auml  LA18  LATIN CAPITAL LETTER A WITH DIAERESIS  .12345.XTV  00C4
%e  &euml  LE17  LATIN SMALL LETTER E WITH DIAERESIS  .12345.XTV  00EB
%E  &Euml  LE18  LATIN CAPITAL LETTER E WITH DIAERESIS  .12345.XTV  00CB
%i  &iuml  LI17  LATIN SMALL LETTER I WITH DIAERESIS  .1.3.5.XTV  00EF
%I  &Iuml  LI18  LATIN CAPITAL LETTER I WITH DIAERESIS  .1.3.5.XTV  00CF
%o  &ouml  LO17  LATIN SMALL LETTER O WITH DIAERESIS  .12345.XTV  00F6
%O  &Ouml  LO18  LATIN CAPITAL LETTER O WITH DIAERESIS  .12345.XTV  00D6
%u  &uuml  LU17  LATIN SMALL LETTER U WITH DIAERESIS  .12345.XTV  00FC
%U  &Uuml  LU18  LATIN CAPITAL LETTER U WITH DIAERESIS  .12345.XTV  00DC
%w  &wuml  LW17  LATIN SMALL LETTER W WITH DIAERESIS  * ..........  1E85
%W  &Wuml  LW18  LATIN CAPITAL LETTER W WITH DIAERESIS  * ..........  1E84
%y  &yuml  LY17  LATIN SMALL LETTER Y WITH DIAERESIS  .1...5.XTV  00FF
%Y  &Yuml  LY18  LATIN CAPITAL LETTER Y WITH DIAERESIS  ......AXTV  0178

~a  &atilde  LA19  LATIN SMALL LETTER A WITH TILDE  .1..45.XTV  00E3
~A  &Atilde  LA20  LATIN CAPITAL LETTER A WITH TILDE  .1..45.XTV  00C3
~n  &ntilde  LN19  LATIN SMALL LETTER N WITH TILDE  .1.3.5.XTV  00F1
~N  &Ntilde  LN20  LATIN CAPITAL LETTER N WITH TILDE  .1.3.5.XTV  00D1
~o  &otilde  LO19  LATIN SMALL LETTER O WITH TILDE  .1..45.XTV  00F5
~O  &Otilde  LO20  LATIN CAPITAL LETTER O WITH TILDE  .1..45.XTV  00D5

*c  &ccaron  LC21  LATIN SMALL LETTER C WITH CARON  ..2.4..XTV  010D
*C  &Ccaron  LC22  LATIN CAPITAL LETTER C WITH CARON  ..2.4..XTV  010C
*d  &dcaron  LD21  LATIN SMALL LETTER D WITH CARON  ..2....XTV  010F
*D  &Dcaron  LD22  LATIN CAPITAL LETTER D WITH CARON  ..2....XTV  010E
*e  &ecaron  LE21  LATIN SMALL LETTER E WITH CARON  ..2....XTV  011B
*E  &Ecaron  LE22  LATIN CAPITAL LETTER E WITH CARON  ..2....XTV  011A
*l  &lcaron  LL21  LATIN SMALL LETTER L WITH CARON  ..2....XTV  013E
*L  &Lcaron  LL22  LATIN CAPITAL LETTER L WITH CARON  ..2....XTV  013D
*n  &ncaron  LN21  LATIN SMALL LETTER N WITH CARON  ..2....XTV  0148
*N  &Ncaron  LN22  LATIN CAPITAL LETTER N WITH CARON  ..2....XTV  0147
*r  &rcaron LR21  LATIN SMALL LETTER R WITH CARON  ..2....XTV  0159
*R  &Rcaron  LR22  LATIN CAPITAL LETTER R WITH CARON  ..2....XTV  0158
*s  &scaron  LS21  LATIN SMALL LETTER S WITH CARON  ..2.4..XTV  0161
*S  &Scaron  LS22  LATIN CAPITAL LETTER S WITH CARON  ..2.4..XTV  0160
*t  &tcaron  LT21  LATIN SMALL LETTER T WITH CARON  ..2....XTV  0165
*T  &Tcaron  LT22  LATIN CAPITAL LETTER T WITH CARON  ..2....XTV  0164
*z  &zcaron  LZ21  LATIN SMALL LETTER Z WITH CARON  ..2.4..XTV  017E
*Z  &Zcaron  LZ22  LATIN CAPITAL LETTER Z WITH CARON  ..2.4..XTV  017D

#a  &abreve  LA23  LATIN SMALL LETTER A WITH BREVE  ..2....XTV  0103
#A  &Abreve  LA24  LATIN CAPITAL LETTER A WITH BREVE  ..2....XTV  0102
#g  &gbreve  LG23  LATIN SMALL LETTER G WITH BREVE  ...3.5AXTV  011F
#G  &Gbreve  LG24  LATIN CAPITAL LETTER G WITH BREVE  ...3.5AXTV  011E
#u  &ubreve  LU23  LATIN SMALL LETTER U WITH BREVE  ...3..AXTV  016D
#U  &Ubreve  LU24  LATIN CAPITAL LETTER U WITH BREVE  ...3..AXTV  016C

+o  &odblac  LO25  LATIN SMALL LETTER O WITH DOUBLE ACUTE  ..2....XTV  0151
+O  &Odblac  LO26  LATIN CAPITAL LETTER O WITH DOUBLE ACUTE  ..2....XTV  0150
+u  &udblac  LU25  LATIN SMALL LETTER U WITH DOUBLE ACUTE  ..2....XTV  0171
+U  &Udblac  LU26  LATIN CAPITAL LETTER U WITH DOUBLE ACUTE  ..2....XTV  0170

@a  &aring  LA27  LATIN SMALL LETTER A WITH RING ABOVE  .1..45.XTV  00E5
@A  &Aring  LA28  LATIN CAPITAL LETTER A WITH RING ABOVE  .1..45.XTV  00C5
@u  &uring  LU27  LATIN SMALL LETTER U WITH RING ABOVE  ..2....XTV  016F
@U  &Uring  LU28  LATIN CAPITAL LETTER U WITH RING ABOVE  ..2....XTV  016E

@c  &cdot  LC29  LATIN SMALL LETTER C WITH DOT ABOVE  ...3..AXTV  010B
@C  &Cdot  LC30  LATIN CAPITAL LETTER C WITH DOT ABOVE  ...3..AXTV  010A
@e  &edot  LE29  LATIN SMALL LETTER E WITH DOT ABOVE  ....4.AXTV  0117
@E  &Edot  LE30  LATIN CAPITAL LETTER E WITH DOT ABOVE  ....4.AXTV  0116
@g  &gdot  LG29  LATIN SMALL LETTER G WITH DOT ABOVE  ...3..AXTV  0121
@G  &Gdot  LG30  LATIN CAPITAL LETTER G WITH DOT ABOVE  ...3..AXTV  0120
@I  &Idot  LI30  LATIN CAPITAL LETTER I WITH DOT ABOVE  ...3.5AXTV  0130
@i  &inodot  LI61  LATIN SMALL LETTER DOTLESS I  ...3.5ACTV  0131
@z  &zdot  LZ29  LATIN SMALL LETTER Z WITH DOT ABOVE  ..23...XTV  017C
@Z  &Zdot  LZ30  LATIN CAPITAL LETTER Z WITH DOT ABOVE  ..23...XTV  017B

=a  &amacr  LA31  LATIN SMALL LETTER A WITH MACRON  ....4.AXTV 0101
=A  &Amacr  LA32  LATIN CAPITAL LETTER A WITH MACRON  ....4.AXTV  0100
=e  &emacr  LE31  LATIN SMALL LETTER E WITH MACRON  ....4.AXTV  0113
=E  &Emacr  LE32  LATIN CAPITAL LETTER E WITH MACRON  ....4.AXTV  0112
=i  &imacr  LI31  LATIN SMALL LETTER I WITH MACRON  ....4.AXTV  012B
=I  &Imacr  LI32  LATIN CAPITAL LETTER I WITH MACRON  ....4.AXTV  012A
=o  &omacr  LO31  LATIN SMALL LETTER O WITH MACRON  ....4.AXTV  014D
=O  &Omacr  LO32  LATIN CAPITAL LETTER O WITH MACRON  ....4.AXTV  014C
=u  &umacr  LU31  LATIN SMALL LETTER U WITH MACRON  ....4.AXTV  016B
=U  &Umacr  LU32  LATIN CAPITAL LETTER U WITH MACRON  ....4.AXTV  016A

=d  &dstrok  LD61  LATIN SMALL LETTER D WITH STROKE  ..2.4..CTV  0111
=D  &Dstrok  LD62  LATIN CAPITAL LETTER D WITH STROKE  ..2.4..CTV  0110
=h  &hstrok  LH61  LATIN SMALL LETTER H WITH STROKE  ...3..ACTV  0127
=H  &Hstrok  LH62  LATIN CAPITAL LETTER H WITH STROKE  ...3..ACTV  0126
=l  &lstrok  LL61  LATIN SMALL LETTER L WITH STROKE  ..2....CTV  0142
=L  &Lstrok  LL62  LATIN CAPITAL LETTER L WITH STROKE  ..2....CTV  0141
$o  &ostrok  LO61  LATIN SMALL LETTER O WITH STROKE  .1..45.CTV  00F8
$O  &Ostrok  LO62  LATIN CAPITAL LETTER O WITH STROKE  .1..45.CTV  00D8
=t  &tstrok  LT61  LATIN SMALL LETTER T WITH STROKE  ....4.ACTV  0167
=T  &Tstrok  LT62  LATIN CAPITAL LETTER T WITH STROKE  ....4.ACTV  0166

$c  &ccedil  LC41  LATIN SMALL LETTER C WITH CEDILLA  .123.5.XTV  00E7
$C  &Ccedil  LC42  LATIN CAPITAL LETTER C WITH CEDILLA  .123.5.XTV  00C7
$g  &gcedil  LG41  LATIN SMALL LETTER G WITH CEDILLA  ....4.AXTV  0123
$G  &Gcedil  LG42  LATIN CAPITAL LETTER G WITH CEDILLA  ....4.AXTV  0122
$k  &kcedil  LK41  LATIN SMALL LETTER K WITH CEDILLA  ....4.AXTV  0137
$K  &Kcedil  LK42  LATIN CAPITAL LETTER K WITH CEDILLA  ....4.AXTV  0136
$l  &lcedil  LL41  LATIN SMALL LETTER L WITH CEDILLA  ....4.AXTV  013C
$L  &Lcedil  LL42  LATIN CAPITAL LETTER L WITH CEDILLA  ....4.AXTV  013B
$n  &ncedil  LN41  LATIN SMALL LETTER N WITH CEDILLA  ....4.AXTV  0146
$N  &Ncedil  LN42  LATIN CAPITAL LETTER N WITH CEDILLA  ....4.AXTV  0145
$r  &rcedil  LR41  LATIN SMALL LETTER R WITH CEDILLA  ....4.AXTV  0157
$R  &Rcedil  LR42  LATIN CAPITAL LETTER R WITH CEDILLA  ....4.AXTV  0156
$s  &scedil  LS41  LATIN SMALL LETTER S WITH CEDILLA  ..23.5.XTV  015F
$S  &Scedil  LS42  LATIN CAPITAL LETTER S WITH CEDILLA  ..23.5.XTV  015E
$t  &tcedil  LT41  LATIN SMALL LETTER T WITH CEDILLA  ..2....XTV  0163
$T  &Tcedil  LT42  LATIN CAPITAL LETTER T WITH CEDILLA  ..2....XTV  0162

$a  &aogon  LA43  LATIN SMALL LETTER A WITH OGONEK  ..2.4..XTV  0105
$A  &Aogon  LA44  LATIN CAPITAL LETTER A WITH OGONEK  ..2.4..XTV  0104
$e  &eogon  LE43  LATIN SMALL LETTER E WITH OGONEK  ..2.4..XTV  0119
$E  &Eogon  LE44  LATIN CAPITAL LETTER E WITH OGONEK  ..2.4..XTV  0118
$i  &iogon  LI43  LATIN SMALL LETTER I WITH OGONEK  ....4.AXTV  012F
$I  &Iogon  LI44  LATIN CAPITAL LETTER I WITH OGONEK  ....4.AXTV  012E
$u  &uogon  LU43  LATIN SMALL LETTER U WITH OGONEK  ....4.AXTV  0173
$U  &Uogon  LU44  LATIN CAPITAL LETTER U WITH OGONEK  ....4.AXTV  0172

&a  &aelig  LA51  LATIN SMALL LETTER AE  .1..45.CTV  00E6
&A  &AElig  LA52  LATIN CAPITAL LETTER AE  .1..45.CTV  00C6
&i  &ijlig  LI51  LATIN SMALL LIGATURE I J  ......ACTV  0133
&I  &IJlig  LI52  LATIN CAPITAL LIGATURE I J  ......ACTV  0132
&o  &oelig  LO51  LATIN SMALL LIGATURE O E  ......ACTV  0153
&O  &OElig  LO52  LATIN CAPITAL LIGATURE O E  ......ACTV  0152
&s  &szlig  LS61  LATIN SMALL LETTER SHARP S (German)  .12345.CTV 00DF
&n  &eng  LN61  LATIN SMALL LETTER ENG (Sami) ....4.ACTV  014B
&N  &ENG  LN62  LATIN CAPITAL LETTER ENG (Sami)  ....4.ACTV  014A
&d  &eth  LD63  LATIN SMALL LETTER ETH (Icelandic)  .1....ACTV  00F0
&D  &ETH  LD64  LATIN CAPITAL LETTER ETH (Icelandic)  .1......TV  00D0
&t  &thorn  LT63  LATIN SMALL LETTER THORN (Icelandic)  .1....ACTV  00FE
&T  &THORN  LT64  LATIN CAPITAL LETTER THORN (Icelandic)  .1....ACTV  00DE
Not included Letters
~i  &itilde  LI19  LATIN SMALL LETTER I WITH TILDE  ....4.AXTV  0129
~I  &Itilde  LI20  LATIN CAPITAL LETTER I WITH TILDE  ....4.AXTV  0128
~u  &utilde  LU19  LATIN SMALL LETTER U WITH TILDE  ....4.AXTV  0169
~U  &Utilde  LU20  LATIN CAPITAL LETTER U WITH TILDE  ....4.AXTV  0168
&k  &kgreen  LK61  LATIN SMALL LETTER KRA (Greenlandic)  ....4.ACTV  0138
&l  &lmidot  LL63  LATIN SMALL LETTER L WITH MIDDLE DOT  ......ACTV  0140
&L  &Lmidot  LL64  LATIN CAPITAL LETTER L WITH MIDDLE DOT  ......ACTV  013F
'n  &napos  LN63  LATIN SMALL LETTER N PRECEDED BY APOSTROPH ......ACTV  0149
Digits
ND01  DIGIT ONE  0.......TV  0031
ND02  DIGIT TWO  0.......TV  0032
ND03  DIGIT THREE  0.......TV  0033
ND04  DIGIT FOUR  0.......TV  0034
ND05  DIGIT FIVE  0.......TV  0035
ND06  DIGIT SIX  0.......TV  0036
ND07  DIGIT SEVEN  0.......TV  0037
ND08  DIGIT EIGHT  0.......TV  0038
ND09  DIGIT NINE  0.......TV  0039
ND10  DIGIT ZERO  0.......TV  0030

TABLE 2

COMPLETE REPERTOIRE OF SPECIAL CHARACTERS REQUIRED FOR LATIN WRITTEN EUROPEAN LANGUAGES
AS INCLUDED IN ISO/IEC 10367

VERSION 2.1
1993-11-02, corr. 1998-08-04
J. W. van Wingen

Grouped to Short Identifier (SID)
Transformation to ASCII in first column,
SGML public entities in 2nd column
Indication in columns 63-72 as in Table 1
Transformation to ASCIIShort identifiers (SID) Letters name Tables in ISO 10367 Code in ISO 10646
@1  &sup1  NS01  SUPERSCRIPT ONE  .1...5.C..  00B9
@2  &sup2  NS02  SUPERSCRIPT TWO  .1.3.5.CTV  00B2
@3  &sup3  NS03  SUPERSCRIPT THREE  .1.3.5.CTV  00B3

_2  &frac12  NF01  VULGAR FRACTION ONE HALF  .1.3.5.CTV  00BD
_4  &frac14  NF04  VULGAR FRACTION ONE QUARTER  .1...5.CTV  00BC
_3  &frac34  NF05  VULGAR FRACTION THREE QUARTERS  .1...5.CTV  00BE
=1  &frac18  NF18  VULGAR FRACTION ONE EIGHTH  ......AC.V  215B
=3  &frac38  NF19  VULGAR FRACTION THREE EIGHTHS  ......AC.V  215C
=5  &frac58  NF20  VULGAR FRACTION FIVE EIGHTHS  ......AC.V  215D
=7  &frac78  NF21  VULGAR FRACTION SEVEN EIGHTHS  ......AC.V  215E

++  &plus  SA01  PLUS SIGN  0.......TV  002B
&lt  SA03  LESS-THAN SIGN  0.......TV  003C
==  &equals  SA04  EQUALS SIGN  0.......TV  003D
>>  &gt  SA05  GREATER-THAN SIGN  0.......TV  003E
_+  &plusmn  SA02  PLUS-MINUS SIGN  .1...5.CTV  00B1
_:  &divide  SA06  DIVISION SIGN  .12345.CTV  00F7
_*  &times  SA07  MULTIPLICATION SIGN  .12345.CTV  00D7

_f  &curren  SC01  CURRENCY SIGN  .12345.CTV  00A4
_L  &pound  SC02  POUND SIGN  .1.3.5.CTV  00A3
$$  &dollar  SC03  DOLLAR SIGN  0.......TV  0024
_c  &cent  SC04  CENT SIGN  .1...5.CTV  00A2
_Y  &yen  SC05  YEN SIGN  .1...5.CTV  00A5

@/  &acute  SD11  ACUTE ACCENT  .12345.XT.  00B4
@\  &grave  SD13  GRAVE ACCENT  0......XT.  0060
@>  &circ  SD15  CIRCUMFLEX ACCENT  0......XT.  005E
@%  &die  SD17  DIAERESIS  .12345.XT.  00A8
@$  &tilde  SD19  TILDE  0......XT.  007E
@*  &caron  SD21  CARON  ..2.4..XT.  02C7
@#  &breve  SD23  BREVE  ..23...XT.  02D8
@"  &dblac  SD25  DOUBLE ACUTE ACCENT  ..2....XT.  02DD
@0  &ring  SD27  RING ABOVE  .......XT.  02DA
@.  &dot  SD29  DOT ABOVE  ..234..XT.  02D9
@=  &macron  SD31  MACRON  .1..45.XT.  00AF
_)  &cedil  SD41  CEDILLA  .12345.XT.  00B8
_(  &ogon  SD43  OGONEK  ..2.4..XT.  02DB

##  &num  SM01  NUMBER SIGN  0.......TV  0023
%%  &percnt  SM02  PERCENT SIGN  0.......TV  0025
&&  &amp  SM03  AMPERSAND  0.......TV  0026
**  &ast  SM04  ASTERISK  0.......TV  002A
@@  &commat  SM05  COMMERCIAL AT  0.......TV  0040
*(  &lsqb  SM06  LEFT SQUARE BRACKET  0.......T.  005B
\\  &bsol  SM07  REVERSE SOLIDUS  0.........  005C
*)  &rsqb  SM08  RIGHT SQUARE BRACKET  0.......T.  005D
&lcub  SM11  LEFT CURLY BRACKET  0.........  007B
_-  &horbar  SM12  HORIZONTAL BAR  ......AC.V  2015
&verbar  SM13  VERTICAL LINE  0.......TV  007C
&rcub  SM14  RIGHT CURLY BRACKET  0.........  007D
_m  &micro  SM17  MICRO SIGN  .1.3.5.CTV  00B5
_O  &ohm  SM18  OHM SIGN  ......ACTV  2126
@0  &deg  SM19  DEGREE SIGN  .12345.CTV  00B0
_o  &ordm  SM20  MASCULINE ORDINAL INDICATOR  .1...5.CTV  00BA
_a  &ordf  SM21  FEMININE ORDINAL INDICATOR  .1...5.CTV  00AA
#S  &sect  SM24  SECTION SIGN  .12345.CTV  00A7
#P  &para  SM25  PILCROW SIGN  .1...5.CTV  00B6
#.  &middot  SM26  MIDDLE DOT  .1.3.5.CTV  00B7
_<  &larr  SM30  LEFTWARDS ARROW  ......AC.V  2190
_>  &rarr  SM31  RIGHTWARDS ARROW  ......AC.V  2192
_A  &uarr  SM32  UPWARDS ARROW  ......AC.V  2191
_V  &darr  SM33  DOWNWARDS ARROW  ......AC.V  2193
#c  &copy  SM52  COPYRIGHT SIGN  .1...5.C..  00A9
#r  &reg  SM53  REGISTERED SIGN  .1...5.C..  00AE
#t  &trade  SM54  TRADE MARK SIGN  ......AC..  2122
*|  &brvbar  SM65  BROKEN BAR  .1...5.C..  00A6
&not  SM66  NOT SIGN  .1...5.C..  00AC
_J  &sung  SM93  MUSIC NOTE (EIGHTH NOTE IN 10646)  ......AC..  266A

SP  &blank  SP01  SPACE  0.......TV  0020
&excl  SP02  EXCLAMATION MARK  0.......TV  0021
*!  &iexcl  SP03  INVERTED EXCLAMATION MARK  .1...5.CTV  00A1
&quot  SP04  QUOTATION MARK  0.......TV  0022
&apos  SP05  APOSTROPHE  0.......TV  0027
( &lpar  SP06  LEFT PARENTHESIS  0.......TV  0028
&rpar  SP07  RIGHT PARENTHESIS  0.......TV  0029
&comma  SP08  COMMA  0.......TV  002C
__  &lowbar  SP09  LOW LINE  0.......TV  005F
&hyphen  SP10  HYPHEN-MINUS  0.......TV  002D
&period  SP11  FULL STOP  0.......TV  002E
//  &sol  SP12  SOLIDUS  0.......TV  002F
&colon  SP13  COLON  0.......TV  003A
&semi  SP14  SEMICOLON  0.......TV  003B
&quest  SP15  QUESTION MARK  0.......TV  003F
*?  &iquest  SP16  INVERTED QUESTION MARK  .1...5.CTV  00BF
*<  &laquo  SP17  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK  .1...5.CTV  00AB
*>  &raquo  SP18  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK  .1...5.CTV  00BB
@(  &lsquo  SP19  LEFT SINGLE QUOTATION MARK  ......AC.V  2018
@)  &rsquo  SP20  RIGHT SINGLE QUOTATION MARK  ......AC.V  2019
@{  &ldquo  SP21  LEFT DOUBLE QUOTATION MARK  ......AC.V  201C
@}  &rdquo  SP22  RIGHT DOUBLE QUOTATION MARK  ......AC.V  201D

NBSP  &nbsp  SP31  NO-BREAK SPACE  .12345.C..  00A0
SHY  &shy  SP32  SOFT HYPHEN  .12345.C..  00AD

ANNEX TO THE ATLANTIC SUBSET (normative)

CHARACTERS CODED IN ISO/IEC 10646
BUT NOT INCLUDED IN THE ATLANTIC SUBSET

VERSION 1.0
1994-10-28, corr. 1998-08-04
J. W. van Wingen

There are several characters included in ISO/IEC 10646-1:1993 for which no justification could be found for inclusion in the Atlantic Subset. Nevertheless, in certain fields of application, too restricted to be of general interest, characters are used for which ISO/IEC 10646 specifies a code. To serve users who do not want to resort to the complete International Standard for information on a few additional characters only, a selection has been made. To present a good view on the matter sometimes letters from the Atlantic Subset itself are being added to a list. In order to provide a facility for referencing from other European or National standards, for those subrepertoires from ISO/IEC 10646-1 that are identified as suitable, a name is specified. This name, the set it covers, and the coding of the characters, are the only elements in this Annex that are normative.

A.1 Danish accented letters

For indicating stress on syllables, or a difference in pronunciation, an ACUTE ACCENT may be applied to characters for vowels. They have been officially classified as being not essential to the language and only rarely used. The following additional characters require specification of coding.
 
 
Transformation to ASCII SGML public entries Short identifiers (SID) Letters name Tables in ISO 10367 Code in ISO 10646
      LATIN SMALL LIGATURE A E WITH ACUTE ..........  01FD
      LATIN CAPITAL LIGATURE A E WITH ACUTE ..........  01FC
      LATIN SMALL LETTER A WITH RING AND ACUTE  ..........  01FF
      LATIN CAPITAL LETTER A WITH RING AND ACUTE  ..........  01FE
      LATIN SMALL LETTER O WITH STROKE AND ACUTE ..........  01FA
      LATIN CAPITAL LETTER O WITH STROKE AND ACUTE ..........  01FB

Named subrepertoire: ADDITIONAL DANISH

A.2 Irish dotted consonants

Up to about 1940 a special font was used for printing Irish. This included a number of consonants WITH DOT ABOVE. These are:
 
 
Transformation to ASCII SGML public entries Short identifiers (SID) Letters name Tables in ISO 10367 Code in ISO 10646
@b  &bdot  LB29  LATIN SMALL LETTER B WITH DOT ABOVE  ..........  1E03
@B  &Bdot  LB30  LATIN CAPITAL LETTER B WITH DOT ABOVE  ..........  1E02
@c  &cdot  LC29  LATIN SMALL LETTER C WITH DOT ABOVE  ..........  010B
@C  &Cdot  LC30  LATIN CAPITAL LETTER C WITH DOT ABOVE  ..........  010A
@d  &ddot  LD29  LATIN SMALL LETTER D WITH DOT ABOVE  ..........  1E0B
@D  &Ddot  LD30  LATIN CAPITAL LETTER D WITH DOT ABOVE  ..........  1E0A
@f  &fdot  LF29  LATIN SMALL LETTER F WITH DOT ABOVE  .......... 1E1F
@F  &Fdot  LF30  LATIN CAPITAL LETTER F WITH DOT ABOVE  ..........  1E1E
@g  &gdot  LG29  LATIN SMALL LETTER G WITH DOT ABOVE  ..........  0121
@G  &Gdot  LG30  LATIN CAPITAL LETTER G WITH DOT ABOVE  ..........  0120
@m  &mdot  LM29  LATIN SMALL LETTER M WITH DOT ABOVE  ..........  1E41
@M  &Mdot  LM30  LATIN CAPITAL LETTER M WITH DOT ABOVE  ..........  1E40
@P  &Pdot  LP30  LATIN CAPITAL LETTER P WITH DOT ABOVE  ..........  1E56
@p  &pdot  LP29  LATIN SMALL LETTER P WITH DOT ABOVE  ..........  1E57
@s  &sdot  LS29  LATIN SMALL LETTER S WITH DOT ABOVE  ..........  1E61
@S  &Sdot  LS30  LATIN CAPITAL LETTER S WITH DOT ABOVE  ..........  1E60
@t  &tdot  LT29  LATIN SMALL LETTER T WITH DOT ABOVE  ..........  1E6B
@T  &Tdot  LT30  LATIN CAPITAL LETTER T WITH DOT ABOVE  ..........  1E6A

Named subrepertoire: ADDITIONAL IRISH

A.3 Complete Welsh

Welsh has w and y for vowels as well. On all vowels the ACUTE, GRAVE, CIRCUMFLEX and DIAERESIS may appear. Not all combinations for w and y were included in the repertoire of ISO/IEC 6937:1994, but these are part of this Subset. For information those in this Subset, but not in ISO/IEC 6937, are indicated with a * (star).

Alphabetically sorted to SID
 
Transformation to ASCII SGML public entries Short identifiers (SID) Letters name Tables in ISO 10367 Code in ISO 10646
w   LW01  LATIN SMALL LETTER W  0.......TV  0177
W   LW02  LATIN CAPITAL LETTER W  0.......TV  0057
/w   LW11  LATIN SMALL LETTER W WITH ACUTE  * ..........  1E83
/W   LW12  LATIN CAPITAL LETTER W WITH ACUTE  * ..........  1E82
\w   LW13  LATIN SMALL LETTER W WITH GRAVE  * ..........  1E81
\W   LW14  LATIN CAPITAL LETTER W WITH GRAVE  * ..........  1E80
>w   LW15  LATIN SMALL LETTER W WITH CIRCUMFLEX  ......AX 0175
>W   LW16  LATIN CAPITAL LETTER W WITH CIRCUMFLEX  ......AXTV  0174
%w   LW17  LATIN SMALL LETTER W WITH DIAERESIS  * ..........  1E85
%W   LW18  LATIN CAPITAL LETTER W WITH DIAERESIS  * ..........  1E84
y   LY01 LATIN SMALL LETTER Y  0.......TV  0077
Y   LY02  LATIN CAPITAL LETTER Y  0.......TV  0057
/y   LY11  LATIN SMALL LETTER Y WITH ACUTE  .12...AXTV  00FD
/Y   LY12  LATIN CAPITAL LETTER Y WITH ACUTE  .12...AXTV  00DD
\y   LY13  LATIN SMALL LETTER Y WITH GRAVE  * ..........  1EF3
\Y   LY14  LATIN CAPITAL LETTER Y WITH GRAVE  * ..........  1EF2
>y   LY15  LATIN SMALL LETTER Y WITH CIRCUMFLEX  ......AXTV  0177
>Y   LY16  LATIN CAPITAL LETTER Y WITH CIRCUMFLEX ......AXTV  0176
%y   LY17  LATIN SMALL LETTER Y WITH DIAERESIS  .1...5.XTV  00FF
%Y   LY18  LATIN CAPITAL LETTER Y WITH DIAERESIS  ......AXTV  0178

A.4 Sami dialect

The letters required for writing the Sami language are included in this Subset, but not those supposed to be needed for the Skolt Sami dialect, spoken by some 500 people only. The letters are specified in Annex A (informative) of ISO 8859-10, registered as ISO-IR 158.

For this set a coding in the BMP was assumed to exist. However, it appeared that not all of these characters were included in the BMP repertoire, or possibly under a different name. In this confused situation information about the two-octet coding of these characters could not be listed here.

A.5 Letters in instruction books

In some books for teaching a language, Latin in particular, use is made of the MACRON and BREVE on the letters a e i o u, indicating long or short vowels. Of these the following are not included in this Subset, but in the BMP they have a code assigned to them.
 
 
Transformation to ASCII SGML public entries Short identifiers (SID) Letters name Tables in ISO 10367 Code in ISO 10646
#e  &ebreve  LE23  LATIN SMALL LETTER E WITH BREVE .......... 0115
#E  &Ebreve  LE24  LATIN CAPITAL LETTER E WITH BREVE ..........  0114
#i  &ibreve  LI23  LATIN SMALL LETTER I WITH BREVE ..........  012D
#I  &Ibreve  LI24  LATIN CAPITAL LETTER I WITH BREVE ..........  012C
#o  &obreve  LO23  LATIN SMALL LETTER O WITH BREVE ..........  014F
#O  &Obreve  LO24  LATIN CAPITAL LETTER O WITH BREVE ..........  014E

Named subrepertoire: ADDITIONAL LATIN

A.6 Obsolete letters

Some letters included in ISO standards earlier than ISO/IEC 10646 have become obsolete. It is now recognized that one letter for Afrikaans and two for Catalan never existed at all. These letters are deprecated now in ISO/IEC 6937:1994. Five letters for Greenlandic disappeared from its orthography in 1973. All these eight letters are still in the repertoire of the UCS for compatibility with the past. The full name and the code are given at the start of Table 1.

A.7 Vietnamese

Contrary to the writing systems for many of the languages used in its neighbouring countries, Vietnamese is written with Latin script. This situation, and the presence of groups of people from Vietnam living in Europe, makes it desirable to include information on the Vietnamese repertoire and its coding in the BMP.

There is only a single additional consonant letter, but besides the usual five letters for vowels, another seven are needed. Furthermore, the language has six tones on which the meaning of a word strongly depends, and which have to be indicated. The letters F, f, J, j, W, w, Z, z are not needed. As a consequence, the Vietnamese repertoire consists of 178 letters, of which 134 do not occur in LATIN-1. For reasons of space the list of letters with their full names, and their coding in UCS-2 is not included here.

Named subrepertoire: ADDITIONAL VIETNAMESE

A.8 Transliterations

Several scripts are being transliterated into the Latin script with the help of special letters, mostly those carrying a macron or a dot above or below. Some of these are specified in ISO TC46 standards, others are used in scientific publications under general agreement. Where these only have one case of letters, the need for capital forms is doubtful in Latin transliteration, and their use arbitrary. In ISO/IEC 10646-1 coding is provided for transliterated Chinese, Arabic and Sanskrit. Giving further details is beyond the scope of this publication.

A.9 PC special characters

Several Personal Computers use code tables of their own, with special characters that do not occur in ISO/IEC 10367. Nevertheless, many are included in the BMP. Because it is difficult to identify these with those in manufacturers tables no named subrepertoire is indicated.



1995-1999. J. W. van Wingen
1999. Designed by Yuri Demchenko, TERENA