ATLANTIC SUBSET
OF THE INTERNATIONAL STANDARD
ISO/IEC 10646 UNIVERSAL MULTIPLE-OCTET CODED CHARACTER
SET
1994-09-15
corrected 1998-08-04
J. W. van Wingen
NOTE: The following text presents what could have been an Atlantic Standard, if such a thing would exist. Thus it has no formal status at all. Nevertheless, it contains everything a serious user would need to know, when he wants to use ISO/IEC 10646-1 for applications restricted to the Atlantic part of the world, without spending $400 on the complete thing.
Part 1: General structure and Latin script
INTRODUCTION
It is commonly understood that the whole of the repertoire of ISO/IEC 10646 Universal Multiple-octet Coded Character Set, is not a firm requirement to large groups of users of European languages on both sides of the Atlantic. In order to present guidance to manufacturers and users that they may avoid doing their own selection a subset is defined specifying the coding of those characters having been identified as the total character repertoire needed for European languages. A larger subset than the minimum set specified here may be needed for special applications, but any extensions are not prohibited. Some recommendations are given in an Annex, for sets of characters needed with identified applications.
The text of this Atlantic Standard (AS) is based on that of ISO/IEC 10646-1:1993 where possible, after removal of everything that is not relevant to the Atlantic situation. It makes this AS a self-contained document which does not require from the reader, if he is interested only in characters used in European languages, any consultation of ISO/IEC 10646-1:1993. On the other hand, if the reader wants to understand the principles of multi-octet coding, and the way these are applied to any script, the study of ISO/IEC 10646-1:1993 is an absolute requirement, in particular where information on a transformation format (UTF-8), retransmission, octet value representation notations, character naming guidelines, is wanted (presented in its Annexes G, H, J, K, R). Thus this AS does not replace the ISO/IEC standard, not even where European scripts are in exclusive use. The text as presented just states what is needed with European languages to specify the coding of the characters contained in this AS, and to indicate requirements to conforming equipment or other character supporting product at procurement. Should, despite the great care taken in the preparation of this document, the text of this AS lead to an interpretation or to a conclusion different from that reached from reading ISO/IEC 10646-1:1993, then that from the latter will prevail.
The numbers of the original clauses of ISO/IEC 10646-1 are given between parentheses behind the number in the heading of the corresponding clause of this Subset, to facilitate comparison. Further reference to this AS will be made as to "this Subset".
Only two-octet coding is used for characters in this Subset.
No levels of implementation are specified.
11 (1) SCOPE
This Atlantic Standard specifies a subset of ISO/IEC 10646, Universal Multiple-octet Coded Character Set, required for coding the character repertoire in modern use of the listed European languages, written with Latin, Greek or Cyrillic script.
Covered are:
Official languages using Latin script:
Albanian |
Latvian |
Official languages using Greek script:
GreekOfficial languages using Cyrillic script:
BulgarianRegional languages using Latin script:
Byelorussian
Macedonian
Russian
Serbian
Ukrainian
Basque (France, Spain)The coding of the repertoires for Afrikaans and Esperanto is included in some normative tables for compatibility with the repertoire of ISO/IEC 6937:1994.
Breton (France)
Catalan (France, Spain, Andorra)
Faroese (Denmark)
Frisian (Netherlands)
Gaelic (UK)
Galician (Spain)
Greenlandic (Denmark)
Rumantsch (Switzerland)
Sami (Norway, Sweden, Finland)
Sorbian (Germany)
Welsh (UK)
Part 1 of this AS covers Latin script, other parts will specify the coding of Greek and Cyrillic script.
The coding method used is that of two-octet form (UCS-2), because all required characters fit in the Basic Multilingual Plane (BMP) that is specified in ISO/IEC 10646-1:1993. No difference exists between any coding in this Subset and the UCS-2 coding of the same character.
A number of subrepertoires of this Subset is indicated, identified with a name, to enable the user to state his requirements in terms of options.
Information on the coding of some characters outside the repertoire specified in ISO/IEC 10367 for use in special applications required for restricted groups of European users is presented in a normative Annex.
2 (2) CONFORMANCE
2.1 Conformance of information interchange
A coded-character-data-element (CC-data-element) within coded information for interchange is in conformance with this Subset if all the coded representations of characters within that CC-data-element conform to the requirements of clause 6.
A claim of conformance shall identify whether the European Latin, the European Greek, the European Cyrillic or the European Special Character Repertoire, or any other subrepertoire of this Subset specified in this ENV, or a combination of these, is adopted.
2.2 Conformance of devices
A device is in conformance with this Subset if it conforms to the requirements of 2.2.1, and either or both of 2.2.2 and 2.2.3.
A claim of conformance shall identify the document which contains the description specified in 2.2.1, and shall identify whether the European Latin, the European Greek, the European Cyrillic or the European Special Character Repertoire, or any other subrepertoire of this Subset specified in this ENV, or a combination of these, is adopted.
2.2.1 Device description
A device that conforms to this Subset shall be the subject of a description that identifies the means by which the user may supply characters to the device, or may recognize them when they are made available to him, as specified respectively in 2.2.2 and 2.2.3.
2.2.2 Originating devices
An originating device shall allow its user to supply any sequence of characters from the repertoire adopted, and shall be capable of transmitting their coded representations within a CC-data-element.
2.2.3 Receiving devices
A receiving device shall be capable of receiving and interpreting any coded representations of characters that are within a CC-data-element, and that conform to 2.1, and shall make the corresponding characters available to its user in such a way that the user can identify them from among those of the repertoire adopted, and can distinguish them from each other.
13 (3) NORMATIVE REFERENCES
The following standards contain provisions which, through reference in this text, constitute provisions of this Atlantic Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Atlantic Standard are encouraged to investigate the possibility of applying the most recent editions of the standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards.
ISO/IEC 2022:1994 Information processing - 7-bit and 8-bit coded character sets - Code extension techniques.
ISO/IEC 6429:1993 Information processing - Control Functions.
ISO/IEC 10367:1991 Information processing - Standardized coded graphic character sets for use in 8-bit codes.
ISO/IEC 10646-1:1993 Information processing - Universal Multiple-octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane.
4 (4) DEFINITIONS
The numbers of definitions are those given in ISO/IEC 10646-1:1993.
Where necessitated by the scope of this Subset definitions have been changed to avoid referring to features not included.
4.4 coded-character-data-element (CC-data-element): An element of interchanged information that is specified to consist of a sequence of coded representations of characters, in accordance with one or more identified standards for coded character sets.
4.5 cell : The place within a row at which an individual character may
be allocated.
4.6 character : A member of a set of elements used for the organization,
control or representation of data.
4.8 coded character : A character together with its coded representation.
4.9 coded character set; code : A set of unambiguous rules that establishes
a character set and the relationship between the characters of the set
and their coded representation.
4.10 code table : A table showing the characters allocated to the octets
in a code.
4.14 control function : An action that affects the recording, processing,
transmission or interpretation of data, and that has a coded representation
consisting of one or more octets.
4.17 device: A component of information processing equipment which
can transmit, and/or receive, coded information within CC-data-elements.
4.18 graphic character : A character, other than a control function,
that has a visual representation normally handwritten, printed or displayed.
4.19 graphic symbol : A visual representation of a graphic character
or of a control function.
4.23 octet : An ordered sequence of eight bits considered as a unit.
4.24 plane : the coding space of this Subset; of 256 rows.
4.28 repertoire : A specified set of characters that are represented
by means of one or more bit combinations of a coded character set.
4.29 row : A subdivision of a plane; of 256 cells.
4.30 script : A set of graphic characters used for the written form
of one or more languages.
4.32 user: A person or other entity that invokes the services provided
by a device.
15 (5) THE UNIVERSAL MULTIPLE-OCTET CODED CHARACTER SET GENERAL STRUCTURE
The general structure of the Universal Multiple-Octet Coded Character Set, of which this Subset is a proper subset, is described in this explanatory clause. The normative specification of the structure is given in later clauses, (this is indicated by the use of the term "shall").
The value of any octet is expressed in hexadecimal notation from 00 to FF in ISO/IEC 10646.
The canonical form of UCS uses a four-dimensional coding space, regarded as a single entity, consisting of 256 * 256 planes. Each plane consists of 256 one-dimensional rows, each row consisting of 256 cells. A character is located and coded at a cell within this coding space or the cell is declared unused.
In the canonical form, four octets are used to represent each character. The first plane, having 00 00 as its first two octets, is called the Basic Multilingual Plane (BMP).
In addition to the canonical form, a two-octet BMP is specified. This BMP can be used as a two-octet coded character set identified as UCS-2.
Subsets of the coding space may be used to give a sub-repertoire of graphic characters. The Atlantic Subset specifies a selection of the coded characters of UCS-2, using two-octet coding only.
6 (6) CODING OF CHARACTERS
In the UCS-2, and thus in this Subset, each character shall be represented by a sequence of two octets. The most significant octet of this sequence shall be the row-octet. The least significant octet of this sequence shall be the cell-octet. Terming the octets for brevity as R-octet and C-octet, this sequence may be represented as
most-significant least-significant
R-octet C-octet
The value of any octet shall be represented by two hexadecimal digits, for examples: 31 or FE. When a single character is to be identified in terms of the values of its row and cell, this shall be represented such as
0031 for DIGIT ZERO
0041 for LATIN CAPITAL LETTER A
Within each octet the most significant bit shall be bit 8 and the least significant bit shall be bit 1. Accordingly, the weight allocated to each bit shall be
high order bits low order bits
bit: b8 b7 b6 b5 b4 b3 b2 b1
weight: 128 64 32 16 8 4 2 1
The sequence of the octets that represent a character, and the most significant and least significant ends of it, shall be maintained as shown above. When not serialized as octets, a more significant octet shall precede less significant octets. When not serialized as octets, the order of octets may be specified by agreement between sender and recipient.
17 (7) SPECIAL FEATURES OF THIS SUBSET
8 (13) NATURE OF THIS SUBSET
ISO/IEC 10646 provides the specification of subsets of coded graphic characters for use in interchange, by originating devices and by receiving devices.
This Subset presents a "limited" subset in the sense defined in subclause 13.1 of ISO/IEC 10646-1:1993, by consisting of a list of graphic characters in the specified subset. It contains no reference to any of the collections that are listed in Annex A of that International Standard, like there are LATIN-1 SUPPLEMENT, LATIN EXTENDED-A, LATIN EXTENDED-B or EXTENDED ADDITIONAL. Many of these collections contain characters not in the repertoire of this Subset.
9 (14) CODED REPRESENTATION FORM OF THIS SUBSET
This Subset provides only a single form, that of characters from the European repertoire with each character represented by two octets.
Within a CC-data-element conforming to the requirements of this Subset a character from the repertoire of this Subset shall be represented by two octets comprising the R-octet and the C-octet as specified in clause 6.
10 (15) IMPLEMENTATION LEVELS
This Subset does not specify implementation levels.
11 (16) USE OF CONTROL FUNCTIONS WITH THIS SUBSET
This Subset provides for use of control functions encoded according to ISO 2022, ISO/IEC 6429 or similarly structured standards for control functions, and standards derived from these. A set or subset of such control functions may be used in conjunction with this coded character set. These standards encode a control function as a sequence of one or more octets.
When a C0 control function of ISO/IEC 6429 is used with this coded character set, its coded representation as specified in ISO/IEC 6429 shall be padded to correspond with the number of octets adopted in this Subset. Thus, the least significant octet shall be the bit combination specified in ISO/IEC 6429, and the more significant octet shall consist of zeros only.
For example, the control function FORM FEED is represented by "000C" in this Subset.
For escape sequences, control sequences, and control strings (see ISO/IEC 6429) consisting of a coded control function consisting of a single bit combination, followed by additional bit combinations in the range 20 to 7F, each bit combination shall be padded by an octet with value 00.
For example, the escape sequence "ESC 02/00 04/00" is represented by "001B 0020 0040".
When using a C1 control function of ISO/IEC 6429 with this coded character set, it shall be coded as ESC Fe sequence (see ISO/IEC 6429) padded as specified above.
For example, the control function PARTIAL LINE BACKWARD - PLU (08/12 in ISO/IEC 6429 representation) is represented by "001B 004C".
Code extension control functions for the ISO 2022 code extension techniques (such as designation escape sequence, single shift and locking shift) shall not be used with this coded character set.
12 (17) DECLARATION OF IDENTIFICATION OF FEATURES
12.1 Purpose and context of identification
CC-data-elements conforming to ISO/IEC 10646 are intended to form all part of a composite unit of coded information that is interchanged between an originator and a recipient. The identification of ISO/IEC 10646, this Subset, or any subset of it, that have been adopted by the originator must also be available to the recipient. The route by which such identification is communicated to the recipient is outside the scope of ISO/IEC 10646 and this Subset.
However, some standards for interchange of coded information may permit, or require, that the coded representation of the identification applicable to the CC-data-element forms a part of the interchanged information. Such coded representations provide all or part of an identification data element, which may be included in information interchange in accordance with the relevant standard.
12.2 Specification of identification
The coded representation for the identification of this Subset, or of any of its subrepertoires, or of a control function set used with any of those, is specified in another Atlantic Standard (in preparation).
113 (18) STRUCTURE OF THE CODE TABLES AND LISTS
Clause 14 (25) sets out the detailed code tables and the list of character names for the graphic characters, their coded representation, and the character name for each character.
The graphic symbols are to be regarded as typical visual representations of the characters. ISO/IEC 10646 does not attempt (nor does this Subset) to prescribe the exact shape of each character. The shape is affected by the design of the font employed, which is outside the scope of ISO/IEC 10646.
Graphic characters specified in ISO/IEC 10646 are uniquely identified by their names. This does not imply that the graphic symbols by which they are commonly imaged are always different. Examples of graphic characters with similar graphic symbols are LATIN CAPITAL LETTER A, GREEK CAPITAL LETTER ALPHA, and CYRILLIC CAPITAL LETTER A.
The meaning attributed to any character is not specified by ISO/IEC 10646; it may differ from country to country, or from one application to another.
14 (25) CODE TABLES AND LISTS OF CHARACTER NAMES
The coding of a character shall be as specified in the tables of this clause. The characters included in these tables constitute the Atlantic Subset of ISO/IEC 10646.
14.1 The characters and their coding required for the European Latin repertoire of letters and digits are specified in Table 1.
14.2 The characters and their coding required for the European Special Characters repertoire are specified in Table 2.
NOTE:
These repertoires taken together (the European Latin subrepertoire)
contain that of ISO/IEC 6937:1994 as a proper subrepertoire, the remaining
characters being those needed for Welsh. A claim stating that the ISO/IEC
6937:1994 repertoire is covered may be formulated as of covering the European
Latin subrepertoire without Full Welsh, or by the Latin-Telematic subrepertoire
(see Table 1).
14.3 The characters and their coding required for the European Greek repertoire of letters and special characters are specified in Part 2 of this AS.
14.4 The characters and their coding required for the European Cyrillic repertoire of letters and special characters are specified in Part 3 of this AS.
14.5 The characters and their coding required for the European repertoire of box drawing characters (as included in ISO/IEC 10367) are specified in Part 4 of this AS.
14.6 The relation between the Atlantic Subset and its subrepertoires may be illustrated by the following scheme:
VERSION 2.1
1995-02-15, correct. 1998-08-04
J. W. van Wingen
COMPLETE REPERTOIRE OF LETTERS AND DIGITS REQUIRED FOR LATIN WRITTEN EUROPEAN LANGUAGES
Grouped to Short Identifier (SID)Table in ISO 10367 where the character is included:
Transformation to ASCII in first column,
SGML public entities in 2nd column
Indication in columns 63-72:
0 | (Table 1/2) Basic G0 Set (as of ISO 4873) |
1 | (Table 3/4) Latin Alphabet No. 1 (as of ISO 8859-1) |
2 | (Table 5/6) Latin Alphabet No. 2 (as of ISO 8859-2) |
3 | (Table 7/8) Latin Alphabet No. 3 (as of ISO 8859-3) |
4 | (Table 9/10) Latin Alphabet No. 4 (as of ISO 8859-4) |
5 | (Table 11/12) Latin Alphabet No. 5 (as of ISO 8859-9) |
A | (Table 21/22) Supplementary Set for Latin Alphabets |
C/X | (Table A.1/2) ISO 6937, Supplementary Set (C) or Repertoire only (X) |
T | Used in Teletex (CCITT T.61) |
V | Used in Videotex (CCITT T.101) |
Code in ISO 10646 indicated as double bytes in hexadecimal notation.
Named subrepertoires:
BASIC LATIN : requires all characters of this table marked with 0.Note: These subrepertoires also require characters from Table 2.
LATIN-1 : requires all characters of this table marked with 1.
LATIN-TELEMATIC: requires all characters of this table.
Transformation to ASCII | SGML public entries | Short identifiers (SID) | Letters name | Tables in ISO 10367 | ISO 10646 binary codes |
a | LA01 | LATIN SMALL LETTER A | 0.......TV | 0061 | |
A | LA02 | LATIN CAPITAL LETTER A | 0.......TV | 0041 | |
b | LB01 | LATIN SMALL LETTER B | 0.......TV | 0062 | |
B | LB02 | LATIN CAPITAL LETTER B | 0.......TV | 0042 | |
c | LC01 | LATIN SMALL LETTER C | 0.......TV | 0063 | |
C | LC02 | LATIN CAPITAL LETTER C | 0.......TV | 0043 | |
d | LD01 | LATIN SMALL LETTER D | 0.......TV | 0064 | |
D | LD02 | LATIN CAPITAL LETTER D | 0.......TV | 0044 | |
e | LE01 | LATIN SMALL LETTER E | 0.......TV | 0065 | |
E | LE02 | LATIN CAPITAL LETTER E | 0.......TV | 0045 | |
f | LF01 | LATIN SMALL LETTER F | 0.......TV | 0066 | |
F | LF02 | LATIN CAPITAL LETTER F | 0.......TV | 0046 | |
g | LG01 | LATIN SMALL LETTER G | 0.......TV | 0067 | |
G | LG02 | LATIN CAPITAL LETTER G | 0.......TV | 0047 | |
h | LH01 | LATIN SMALL LETTER H | 0.......TV | 0068 | |
H | LH02 | LATIN CAPITAL LETTER H | 0.......TV | 0048 | |
i | LI01 | LATIN SMALL LETTER I | 0.......TV | 0069 | |
I | LI02 | LATIN CAPITAL LETTER I | 0.......TV | 0049 | |
j | LJ01 | LATIN SMALL LETTER J | 0.......TV | 006A | |
J | LJ02 | LATIN CAPITAL LETTER J | 0.......TV | 004A | |
k | LK01 | LATIN SMALL LETTER K | 0.......TV | 006B | |
K | LK02 | LATIN CAPITAL LETTER K | 0.......TV | 004B | |
l | LL01 | LATIN SMALL LETTER L | 0.......TV | 006C | |
L | LL02 | LATIN CAPITAL LETTER L | 0.......TV | 004C | |
m | LM01 | LATIN SMALL LETTER M | 0.......TV | 006D | |
M | LM02 | LATIN CAPITAL LETTER M | 0.......TV | 004D | |
n | LN01 | LATIN SMALL LETTER N | 0.......TV | 006E | |
N | LN02 | LATIN CAPITAL LETTER N | 0.......TV | 004E | |
o | LO01 | LATIN SMALL LETTER O | 0.......TV | 006F | |
O | LO02 | LATIN CAPITAL LETTER O | 0.......TV | 004F | |
p | LP01 | LATIN SMALL LETTER P | 0.......TV | 0070 | |
P | LP02 | LATIN CAPITAL LETTER P | 0.......TV | 0050 | |
q | LQ01 | LATIN SMALL LETTER Q | 0.......TV | 0071 | |
Q | LQ02 | LATIN CAPITAL LETTER Q | 0.......TV | 0051 | |
r | LR01 | LATIN SMALL LETTER R | 0.......TV | 0072 | |
R | LR02 | LATIN CAPITAL LETTER R | 0.......TV | 0052 | |
s | LS01 | LATIN SMALL LETTER S | 0.......TV | 0073 | |
S | LS02 | LATIN CAPITAL LETTER S | 0.......TV | 0053 | |
t | LT01 | LATIN SMALL LETTER T | 0.......TV | 0074 | |
T | LT02 | LATIN CAPITAL LETTER T | 0.......TV | 0054 | |
u | LU01 | LATIN SMALL LETTER U | 0.......TV | 0075 | |
U | LU02 | LATIN CAPITAL LETTER U | 0.......TV | 0055 | |
v | LV01 | LATIN SMALL LETTER V | 0.......TV | 0076 | |
V | LV02 | LATIN CAPITAL LETTER V | 0.......TV | 0056 | |
w | LW01 | LATIN SMALL LETTER W | 0.......TV | 0077 | |
W | LW02 | LATIN CAPITAL LETTER W | 0.......TV | 0057 | |
x | LX01 | LATIN SMALL LETTER X | 0.......TV | 0078 | |
X | LX02 | LATIN CAPITAL LETTER X | 0.......TV | 0058 | |
y | LY01 | LATIN SMALL LETTER Y | 0.......TV | 0079 | |
Y | LY02 | LATIN CAPITAL LETTER Y | 0.......TV | 0059 | |
z | LZ01 | LATIN SMALL LETTER Z | 0.......TV | 007A | |
Z | LZ02 | LATIN CAPITAL LETTER Z | 0.......TV | 005A | |
/a | á | LA11 | LATIN SMALL LETTER A WITH ACUTE | .12345.XTV | 00E1 |
/A | Á | LA12 | LATIN CAPITAL LETTER A WITH ACUTE | .12345.XTV | 00C1 |
/c | &cacute | LC11 | LATIN SMALL LETTER C WITH ACUTE | ..2....XTV | 0107 |
/C | &Cacute | LC12 | LATIN CAPITAL LETTER C WITH ACUTE | ..2....XTV | 0106 |
/e | é | LE11 | LATIN SMALL LETTER E WITH ACUTE | .12345.XTV | 00E9 |
/E | É | LE12 | LATIN CAPITAL LETTER E WITH ACUTE | .12345.XTV | 00C9 |
/i | í | LI11 | LATIN SMALL LETTER I WITH ACUTE | .12345.XTV | 00ED |
/I | Í | LI12 | LATIN CAPITAL LETTER I WITH ACUTE | .12345.XTV | 00CD |
/l | &lacute | LL11 | LATIN SMALL LETTER L WITH ACUTE | ..2....XTV | 013A |
/L | &Lacute | LL12 | LATIN CAPITAL LETTER L WITH ACUTE | ..2....XTV | 0139 |
/n | &nacute | LN11 | LATIN SMALL LETTER N WITH ACUTE | ..2....XTV | 0144 |
/N | &Nacute | LN12 | LATIN CAPITAL LETTER N WITH ACUTE | ..2....XTV | 0143 |
/o | ó | LO11 | LATIN SMALL LETTER O WITH ACUTE | .123.5.XTV | 00F3 |
/O | Ó | LO12 | LATIN CAPITAL LETTER O WITH ACUTE | .123.5.XTV | 00D3 |
/r | &racute | LR11 | LATIN SMALL LETTER R WITH ACUTE | ..2....XTV | 0155 |
/R | &Racute | LR12 | LATIN CAPITAL LETTER R WITH ACUTE | ..2....XTV | 0154 |
/s | &sacute | LS11 | LATIN SMALL LETTER S WITH ACUTE | ..2....XTV | 015B |
/S | &Sacute | LS12 | LATIN CAPITAL LETTER S WITH ACUTE | ..2....XTV | 015A |
/u | ú | LU11 | LATIN SMALL LETTER U WITH ACUTE | .12345.XTV | 00FA |
/U | Ú | LU12 | LATIN CAPITAL LETTER U WITH ACUTE | .12345.XTV | 00DA |
/w | &wacute | LW11 | LATIN SMALL LETTER W WITH ACUTE | * .......... | 1E83 |
/W | &Wacute | LW12 | LATIN CAPITAL LETTER W WITH ACUTE | * .......... | 1E82 |
/y | ý | LY11 | LATIN SMALL LETTER Y WITH ACUTE | .12...AXTV | 00FD |
/Y | Ý | LY12 | LATIN CAPITAL LETTER Y WITH ACUTE | .12...AXTV | 00DD |
/z | &zacute | LZ11 | LATIN SMALL LETTER Z WITH ACUTE | ..2....XTV | 017A |
/Z | &Zacute | LZ12 | LATIN CAPITAL LETTER Z WITH ACUTE | ..2....XTV | 0179 |
|
|||||
\a | à | LA13 | LATIN SMALL LETTER A WITH GRAVE | .1.3.5.XTV | 00E0 |
\A | À | LA14 | LATIN CAPITAL LETTER A WITH GRAVE | .1.3.5.XTV | 00C0 |
\e | è | LE13 | LATIN SMALL LETTER E WITH GRAVE | .1.3.5.XTV | 00E8 |
\E | È | LE14 | LATIN CAPITAL LETTER E WITH GRAVE | .1.3.5.XTV | 00C8 |
\i | ì | LI13 | LATIN SMALL LETTER I WITH GRAVE | .1.3.5.XTV | 00EC |
\I | Ì | LI14 | LATIN CAPITAL LETTER I WITH GRAVE | .1.3.5.XTV | 00CC |
\o | ò | LO13 | LATIN SMALL LETTER O WITH GRAVE | .1.3.5.XTV | 00F2 |
\O | Ò | LO14 | LATIN CAPITAL LETTER O WITH GRAVE | .1.3.5.XTV | 00D2 |
\u | ù | LU13 | LATIN SMALL LETTER U WITH GRAVE | .1.3.5.XTV | 00F9 |
\U | Ù | LU14 | LATIN CAPITAL LETTER U WITH GRAVE | .1.3.5.XTV | 00D9 |
\w | &wgrave | LW13 | LATIN SMALL LETTER W WITH GRAVE | * .......... | 1E81 |
\W | &Wgrave | LW14 | LATIN CAPITAL LETTER W WITH GRAVE | * .......... | 1E80 |
\y | &ygrave | LY13 | LATIN SMALL LETTER Y WITH GRAVE | * .......... | 1EF3 |
\Y | &Ygrave | LY14 | LATIN CAPITAL LETTER Y WITH GRAVE | * .......... | 1EF2 |
|
|||||
>a | â | LA15 | LATIN SMALL LETTER A WITH CIRCUMFLEX | .12345.XTV | 00E2 |
>A | Â | LA16 | LATIN CAPITAL LETTER A WITH CIRCUMFLEX | .12345.XTV | 00C2 |
>c | &ccirc | LC15 | LATIN SMALL LETTER C WITH CIRCUMFLEX | ...3..AXTV | 0109 |
>C | &Ccirc | LC16 | LATIN CAPITAL LETTER C WITH CIRCUMFLEX | ...3..AXTV | 0108 |
>e | ê | LE15 | LATIN SMALL LETTER E WITH CIRCUMFLEX | .1.3.5.XTV | 00EA |
>E | Ê | LE16 | LATIN CAPITAL LETTER E WITH CIRCUMFLEX | .1.3.5.XTV | 00CA |
>g | &gcirc | LG15 | LATIN SMALL LETTER G WITH CIRCUMFLEX | ...3..AXTV | 011D |
>G | &Gcirc | LG16 | LATIN CAPITAL LETTER G WITH CIRCUMFLEX | ...3..AXTV | 011C |
>h | &hcirc | LH15 | LATIN SMALL LETTER H WITH CIRCUMFLEX | ...3..AXTV | 0125 |
>H | &Hcirc | LH16 | LATIN CAPITAL LETTER H WITH CIRCUMFLEX | ...3..AXTV | 0124 |
>i | î | LI15 | LATIN SMALL LETTER I WITH CIRCUMFLEX | .12345.XTV | 00EE |
>I | Î | LI16 | LATIN CAPITAL LETTER I WITH CIRCUMFLEX | .12345.XTV | 00CE |
>j | &jcirc | LJ15 | LATIN SMALL LETTER J WITH CIRCUMFLEX | ...3..AXTV | 0135 |
>J | &Jcirc | LJ16 | LATIN CAPITAL LETTER J WITH CIRCUMFLEX | ...3..AXTV | 0134 |
>o | ô | LO15 | LATIN SMALL LETTER O WITH CIRCUMFLEX | .12345.XTV | 00F4 |
>O | Ô | LO16 | LATIN CAPITAL LETTER O WITH CIRCUMFLEX | .12345.XTV | 00D4 |
>s | &scirc | LS15 | LATIN SMALL LETTER S WITH CIRCUMFLEX | ...3..AXTV | 015D |
>S | &Scirc | LS16 | LATIN CAPITAL LETTER S WITH CIRCUMFLEX | ...3..AXTV | 015C |
>u | û | LU15 | LATIN SMALL LETTER U WITH CIRCUMFLEX | .1.345.XTV | 00FB |
>U | Û | LU16 | LATIN CAPITAL LETTER U WITH CIRCUMFLEX | .1.345.XTV | 00DB |
>w | &wcirc | LW15 | LATIN SMALL LETTER W WITH CIRCUMFLEX | ......AXTV | 0175 |
>W | &Wcirc | LW16 | LATIN CAPITAL LETTER W WITH CIRCUMFLEX | ......AXTV | 0174 |
>y | &ycirc | LY15 | LATIN SMALL LETTER Y WITH CIRCUMFLEX | ......AXTV | 0177 |
>Y | &Ycirc | LY16 | LATIN CAPITAL LETTER Y WITH CIRCUMFLEX | ......AXTV | 0176 |
|
|||||
%a | ä | LA17 | LATIN SMALL LETTER A WITH DIAERESIS | .12345.XTV | 00E4 |
%A | Ä | LA18 | LATIN CAPITAL LETTER A WITH DIAERESIS | .12345.XTV | 00C4 |
%e | ë | LE17 | LATIN SMALL LETTER E WITH DIAERESIS | .12345.XTV | 00EB |
%E | Ë | LE18 | LATIN CAPITAL LETTER E WITH DIAERESIS | .12345.XTV | 00CB |
%i | ï | LI17 | LATIN SMALL LETTER I WITH DIAERESIS | .1.3.5.XTV | 00EF |
%I | Ï | LI18 | LATIN CAPITAL LETTER I WITH DIAERESIS | .1.3.5.XTV | 00CF |
%o | ö | LO17 | LATIN SMALL LETTER O WITH DIAERESIS | .12345.XTV | 00F6 |
%O | Ö | LO18 | LATIN CAPITAL LETTER O WITH DIAERESIS | .12345.XTV | 00D6 |
%u | ü | LU17 | LATIN SMALL LETTER U WITH DIAERESIS | .12345.XTV | 00FC |
%U | Ü | LU18 | LATIN CAPITAL LETTER U WITH DIAERESIS | .12345.XTV | 00DC |
%w | &wuml | LW17 | LATIN SMALL LETTER W WITH DIAERESIS | * .......... | 1E85 |
%W | &Wuml | LW18 | LATIN CAPITAL LETTER W WITH DIAERESIS | * .......... | 1E84 |
%y | ÿ | LY17 | LATIN SMALL LETTER Y WITH DIAERESIS | .1...5.XTV | 00FF |
%Y | &Yuml | LY18 | LATIN CAPITAL LETTER Y WITH DIAERESIS | ......AXTV | 0178 |
|
|||||
~a | ã | LA19 | LATIN SMALL LETTER A WITH TILDE | .1..45.XTV | 00E3 |
~A | Ã | LA20 | LATIN CAPITAL LETTER A WITH TILDE | .1..45.XTV | 00C3 |
~n | ñ | LN19 | LATIN SMALL LETTER N WITH TILDE | .1.3.5.XTV | 00F1 |
~N | Ñ | LN20 | LATIN CAPITAL LETTER N WITH TILDE | .1.3.5.XTV | 00D1 |
~o | õ | LO19 | LATIN SMALL LETTER O WITH TILDE | .1..45.XTV | 00F5 |
~O | Õ | LO20 | LATIN CAPITAL LETTER O WITH TILDE | .1..45.XTV | 00D5 |
|
|||||
*c | &ccaron | LC21 | LATIN SMALL LETTER C WITH CARON | ..2.4..XTV | 010D |
*C | &Ccaron | LC22 | LATIN CAPITAL LETTER C WITH CARON | ..2.4..XTV | 010C |
*d | &dcaron | LD21 | LATIN SMALL LETTER D WITH CARON | ..2....XTV | 010F |
*D | &Dcaron | LD22 | LATIN CAPITAL LETTER D WITH CARON | ..2....XTV | 010E |
*e | &ecaron | LE21 | LATIN SMALL LETTER E WITH CARON | ..2....XTV | 011B |
*E | &Ecaron | LE22 | LATIN CAPITAL LETTER E WITH CARON | ..2....XTV | 011A |
*l | &lcaron | LL21 | LATIN SMALL LETTER L WITH CARON | ..2....XTV | 013E |
*L | &Lcaron | LL22 | LATIN CAPITAL LETTER L WITH CARON | ..2....XTV | 013D |
*n | &ncaron | LN21 | LATIN SMALL LETTER N WITH CARON | ..2....XTV | 0148 |
*N | &Ncaron | LN22 | LATIN CAPITAL LETTER N WITH CARON | ..2....XTV | 0147 |
*r | &rcaron | LR21 | LATIN SMALL LETTER R WITH CARON | ..2....XTV | 0159 |
*R | &Rcaron | LR22 | LATIN CAPITAL LETTER R WITH CARON | ..2....XTV | 0158 |
*s | &scaron | LS21 | LATIN SMALL LETTER S WITH CARON | ..2.4..XTV | 0161 |
*S | &Scaron | LS22 | LATIN CAPITAL LETTER S WITH CARON | ..2.4..XTV | 0160 |
*t | &tcaron | LT21 | LATIN SMALL LETTER T WITH CARON | ..2....XTV | 0165 |
*T | &Tcaron | LT22 | LATIN CAPITAL LETTER T WITH CARON | ..2....XTV | 0164 |
*z | &zcaron | LZ21 | LATIN SMALL LETTER Z WITH CARON | ..2.4..XTV | 017E |
*Z | &Zcaron | LZ22 | LATIN CAPITAL LETTER Z WITH CARON | ..2.4..XTV | 017D |
|
|||||
#a | &abreve | LA23 | LATIN SMALL LETTER A WITH BREVE | ..2....XTV | 0103 |
#A | &Abreve | LA24 | LATIN CAPITAL LETTER A WITH BREVE | ..2....XTV | 0102 |
#g | &gbreve | LG23 | LATIN SMALL LETTER G WITH BREVE | ...3.5AXTV | 011F |
#G | &Gbreve | LG24 | LATIN CAPITAL LETTER G WITH BREVE | ...3.5AXTV | 011E |
#u | &ubreve | LU23 | LATIN SMALL LETTER U WITH BREVE | ...3..AXTV | 016D |
#U | &Ubreve | LU24 | LATIN CAPITAL LETTER U WITH BREVE | ...3..AXTV | 016C |
|
|||||
+o | &odblac | LO25 | LATIN SMALL LETTER O WITH DOUBLE ACUTE | ..2....XTV | 0151 |
+O | &Odblac | LO26 | LATIN CAPITAL LETTER O WITH DOUBLE ACUTE | ..2....XTV | 0150 |
+u | &udblac | LU25 | LATIN SMALL LETTER U WITH DOUBLE ACUTE | ..2....XTV | 0171 |
+U | &Udblac | LU26 | LATIN CAPITAL LETTER U WITH DOUBLE ACUTE | ..2....XTV | 0170 |
|
|||||
@a | å | LA27 | LATIN SMALL LETTER A WITH RING ABOVE | .1..45.XTV | 00E5 |
@A | Å | LA28 | LATIN CAPITAL LETTER A WITH RING ABOVE | .1..45.XTV | 00C5 |
@u | &uring | LU27 | LATIN SMALL LETTER U WITH RING ABOVE | ..2....XTV | 016F |
@U | &Uring | LU28 | LATIN CAPITAL LETTER U WITH RING ABOVE | ..2....XTV | 016E |
|
|||||
@c | &cdot | LC29 | LATIN SMALL LETTER C WITH DOT ABOVE | ...3..AXTV | 010B |
@C | &Cdot | LC30 | LATIN CAPITAL LETTER C WITH DOT ABOVE | ...3..AXTV | 010A |
@e | &edot | LE29 | LATIN SMALL LETTER E WITH DOT ABOVE | ....4.AXTV | 0117 |
@E | &Edot | LE30 | LATIN CAPITAL LETTER E WITH DOT ABOVE | ....4.AXTV | 0116 |
@g | &gdot | LG29 | LATIN SMALL LETTER G WITH DOT ABOVE | ...3..AXTV | 0121 |
@G | &Gdot | LG30 | LATIN CAPITAL LETTER G WITH DOT ABOVE | ...3..AXTV | 0120 |
@I | &Idot | LI30 | LATIN CAPITAL LETTER I WITH DOT ABOVE | ...3.5AXTV | 0130 |
@i | &inodot | LI61 | LATIN SMALL LETTER DOTLESS I | ...3.5ACTV | 0131 |
@z | &zdot | LZ29 | LATIN SMALL LETTER Z WITH DOT ABOVE | ..23...XTV | 017C |
@Z | &Zdot | LZ30 | LATIN CAPITAL LETTER Z WITH DOT ABOVE | ..23...XTV | 017B |
|
|||||
=a | &amacr | LA31 | LATIN SMALL LETTER A WITH MACRON | ....4.AXTV | 0101 |
=A | &Amacr | LA32 | LATIN CAPITAL LETTER A WITH MACRON | ....4.AXTV | 0100 |
=e | &emacr | LE31 | LATIN SMALL LETTER E WITH MACRON | ....4.AXTV | 0113 |
=E | &Emacr | LE32 | LATIN CAPITAL LETTER E WITH MACRON | ....4.AXTV | 0112 |
=i | &imacr | LI31 | LATIN SMALL LETTER I WITH MACRON | ....4.AXTV | 012B |
=I | &Imacr | LI32 | LATIN CAPITAL LETTER I WITH MACRON | ....4.AXTV | 012A |
=o | &omacr | LO31 | LATIN SMALL LETTER O WITH MACRON | ....4.AXTV | 014D |
=O | &Omacr | LO32 | LATIN CAPITAL LETTER O WITH MACRON | ....4.AXTV | 014C |
=u | &umacr | LU31 | LATIN SMALL LETTER U WITH MACRON | ....4.AXTV | 016B |
=U | &Umacr | LU32 | LATIN CAPITAL LETTER U WITH MACRON | ....4.AXTV | 016A |
|
|||||
=d | &dstrok | LD61 | LATIN SMALL LETTER D WITH STROKE | ..2.4..CTV | 0111 |
=D | &Dstrok | LD62 | LATIN CAPITAL LETTER D WITH STROKE | ..2.4..CTV | 0110 |
=h | &hstrok | LH61 | LATIN SMALL LETTER H WITH STROKE | ...3..ACTV | 0127 |
=H | &Hstrok | LH62 | LATIN CAPITAL LETTER H WITH STROKE | ...3..ACTV | 0126 |
=l | &lstrok | LL61 | LATIN SMALL LETTER L WITH STROKE | ..2....CTV | 0142 |
=L | &Lstrok | LL62 | LATIN CAPITAL LETTER L WITH STROKE | ..2....CTV | 0141 |
$o | &ostrok | LO61 | LATIN SMALL LETTER O WITH STROKE | .1..45.CTV | 00F8 |
$O | &Ostrok | LO62 | LATIN CAPITAL LETTER O WITH STROKE | .1..45.CTV | 00D8 |
=t | &tstrok | LT61 | LATIN SMALL LETTER T WITH STROKE | ....4.ACTV | 0167 |
=T | &Tstrok | LT62 | LATIN CAPITAL LETTER T WITH STROKE | ....4.ACTV | 0166 |
|
|||||
$c | ç | LC41 | LATIN SMALL LETTER C WITH CEDILLA | .123.5.XTV | 00E7 |
$C | Ç | LC42 | LATIN CAPITAL LETTER C WITH CEDILLA | .123.5.XTV | 00C7 |
$g | &gcedil | LG41 | LATIN SMALL LETTER G WITH CEDILLA | ....4.AXTV | 0123 |
$G | &Gcedil | LG42 | LATIN CAPITAL LETTER G WITH CEDILLA | ....4.AXTV | 0122 |
$k | &kcedil | LK41 | LATIN SMALL LETTER K WITH CEDILLA | ....4.AXTV | 0137 |
$K | &Kcedil | LK42 | LATIN CAPITAL LETTER K WITH CEDILLA | ....4.AXTV | 0136 |
$l | &lcedil | LL41 | LATIN SMALL LETTER L WITH CEDILLA | ....4.AXTV | 013C |
$L | &Lcedil | LL42 | LATIN CAPITAL LETTER L WITH CEDILLA | ....4.AXTV | 013B |
$n | &ncedil | LN41 | LATIN SMALL LETTER N WITH CEDILLA | ....4.AXTV | 0146 |
$N | &Ncedil | LN42 | LATIN CAPITAL LETTER N WITH CEDILLA | ....4.AXTV | 0145 |
$r | &rcedil | LR41 | LATIN SMALL LETTER R WITH CEDILLA | ....4.AXTV | 0157 |
$R | &Rcedil | LR42 | LATIN CAPITAL LETTER R WITH CEDILLA | ....4.AXTV | 0156 |
$s | &scedil | LS41 | LATIN SMALL LETTER S WITH CEDILLA | ..23.5.XTV | 015F |
$S | &Scedil | LS42 | LATIN CAPITAL LETTER S WITH CEDILLA | ..23.5.XTV | 015E |
$t | &tcedil | LT41 | LATIN SMALL LETTER T WITH CEDILLA | ..2....XTV | 0163 |
$T | &Tcedil | LT42 | LATIN CAPITAL LETTER T WITH CEDILLA | ..2....XTV | 0162 |
|
|||||
$a | &aogon | LA43 | LATIN SMALL LETTER A WITH OGONEK | ..2.4..XTV | 0105 |
$A | &Aogon | LA44 | LATIN CAPITAL LETTER A WITH OGONEK | ..2.4..XTV | 0104 |
$e | &eogon | LE43 | LATIN SMALL LETTER E WITH OGONEK | ..2.4..XTV | 0119 |
$E | &Eogon | LE44 | LATIN CAPITAL LETTER E WITH OGONEK | ..2.4..XTV | 0118 |
$i | &iogon | LI43 | LATIN SMALL LETTER I WITH OGONEK | ....4.AXTV | 012F |
$I | &Iogon | LI44 | LATIN CAPITAL LETTER I WITH OGONEK | ....4.AXTV | 012E |
$u | &uogon | LU43 | LATIN SMALL LETTER U WITH OGONEK | ....4.AXTV | 0173 |
$U | &Uogon | LU44 | LATIN CAPITAL LETTER U WITH OGONEK | ....4.AXTV | 0172 |
|
|||||
&a | æ | LA51 | LATIN SMALL LETTER AE | .1..45.CTV | 00E6 |
&A | Æ | LA52 | LATIN CAPITAL LETTER AE | .1..45.CTV | 00C6 |
&i | &ijlig | LI51 | LATIN SMALL LIGATURE I J | ......ACTV | 0133 |
&I | &IJlig | LI52 | LATIN CAPITAL LIGATURE I J | ......ACTV | 0132 |
&o | &oelig | LO51 | LATIN SMALL LIGATURE O E | ......ACTV | 0153 |
&O | &OElig | LO52 | LATIN CAPITAL LIGATURE O E | ......ACTV | 0152 |
&s | ß | LS61 | LATIN SMALL LETTER SHARP S (German) | .12345.CTV | 00DF |
&n | &eng | LN61 | LATIN SMALL LETTER ENG (Sami) | ....4.ACTV | 014B |
&N | &ENG | LN62 | LATIN CAPITAL LETTER ENG (Sami) | ....4.ACTV | 014A |
&d | ð | LD63 | LATIN SMALL LETTER ETH (Icelandic) | .1....ACTV | 00F0 |
&D | Ð | LD64 | LATIN CAPITAL LETTER ETH (Icelandic) | .1......TV | 00D0 |
&t | þ | LT63 | LATIN SMALL LETTER THORN (Icelandic) | .1....ACTV | 00FE |
&T | Þ | LT64 | LATIN CAPITAL LETTER THORN (Icelandic) | .1....ACTV | 00DE |
Not included Letters | |||||
~i | &itilde | LI19 | LATIN SMALL LETTER I WITH TILDE | ....4.AXTV | 0129 |
~I | &Itilde | LI20 | LATIN CAPITAL LETTER I WITH TILDE | ....4.AXTV | 0128 |
~u | &utilde | LU19 | LATIN SMALL LETTER U WITH TILDE | ....4.AXTV | 0169 |
~U | &Utilde | LU20 | LATIN CAPITAL LETTER U WITH TILDE | ....4.AXTV | 0168 |
&k | &kgreen | LK61 | LATIN SMALL LETTER KRA (Greenlandic) | ....4.ACTV | 0138 |
&l | &lmidot | LL63 | LATIN SMALL LETTER L WITH MIDDLE DOT | ......ACTV | 0140 |
&L | &Lmidot | LL64 | LATIN CAPITAL LETTER L WITH MIDDLE DOT | ......ACTV | 013F |
'n | &napos | LN63 | LATIN SMALL LETTER N PRECEDED BY APOSTROPH | ......ACTV | 0149 |
Digits | |||||
1 | ND01 | DIGIT ONE | 0.......TV | 0031 | |
2 | ND02 | DIGIT TWO | 0.......TV | 0032 | |
3 | ND03 | DIGIT THREE | 0.......TV | 0033 | |
4 | ND04 | DIGIT FOUR | 0.......TV | 0034 | |
5 | ND05 | DIGIT FIVE | 0.......TV | 0035 | |
6 | ND06 | DIGIT SIX | 0.......TV | 0036 | |
7 | ND07 | DIGIT SEVEN | 0.......TV | 0037 | |
8 | ND08 | DIGIT EIGHT | 0.......TV | 0038 | |
9 | ND09 | DIGIT NINE | 0.......TV | 0039 | |
0 | ND10 | DIGIT ZERO | 0.......TV | 0030 |
COMPLETE REPERTOIRE OF SPECIAL CHARACTERS REQUIRED FOR LATIN WRITTEN
EUROPEAN LANGUAGES
AS INCLUDED IN ISO/IEC 10367
VERSION 2.1
1993-11-02, corr. 1998-08-04
J. W. van Wingen
Grouped to Short Identifier (SID)
Transformation to ASCII in first column,
SGML public entities in 2nd column
Indication in columns 63-72 as in Table 1
Transformation to ASCIIShort identifiers (SID) | Letters name | Tables in ISO 10367 | Code in ISO 10646 | |||
@1 | ¹ | NS01 | SUPERSCRIPT ONE | .1...5.C.. | 00B9 | |
@2 | ² | NS02 | SUPERSCRIPT TWO | .1.3.5.CTV | 00B2 | |
@3 | ³ | NS03 | SUPERSCRIPT THREE | .1.3.5.CTV | 00B3 | |
|
||||||
_2 | ½ | NF01 | VULGAR FRACTION ONE HALF | .1.3.5.CTV | 00BD | |
_4 | ¼ | NF04 | VULGAR FRACTION ONE QUARTER | .1...5.CTV | 00BC | |
_3 | ¾ | NF05 | VULGAR FRACTION THREE QUARTERS | .1...5.CTV | 00BE | |
=1 | &frac18 | NF18 | VULGAR FRACTION ONE EIGHTH | ......AC.V | 215B | |
=3 | &frac38 | NF19 | VULGAR FRACTION THREE EIGHTHS | ......AC.V | 215C | |
=5 | &frac58 | NF20 | VULGAR FRACTION FIVE EIGHTHS | ......AC.V | 215D | |
=7 | &frac78 | NF21 | VULGAR FRACTION SEVEN EIGHTHS | ......AC.V | 215E | |
|
||||||
++ | &plus | SA01 | PLUS SIGN | 0.......TV | 002B | |
< | < | SA03 | LESS-THAN SIGN | 0.......TV | 003C | |
== | &equals | SA04 | EQUALS SIGN | 0.......TV | 003D | |
>> | > | SA05 | GREATER-THAN SIGN | 0.......TV | 003E | |
_+ | ± | SA02 | PLUS-MINUS SIGN | .1...5.CTV | 00B1 | |
_: | ÷ | SA06 | DIVISION SIGN | .12345.CTV | 00F7 | |
_* | × | SA07 | MULTIPLICATION SIGN | .12345.CTV | 00D7 | |
|
||||||
_f | ¤ | SC01 | CURRENCY SIGN | .12345.CTV | 00A4 | |
_L | £ | SC02 | POUND SIGN | .1.3.5.CTV | 00A3 | |
$$ | &dollar | SC03 | DOLLAR SIGN | 0.......TV | 0024 | |
_c | ¢ | SC04 | CENT SIGN | .1...5.CTV | 00A2 | |
_Y | ¥ | SC05 | YEN SIGN | .1...5.CTV | 00A5 | |
|
||||||
@/ | ´ | SD11 | ACUTE ACCENT | .12345.XT. | 00B4 | |
@\ | &grave | SD13 | GRAVE ACCENT | 0......XT. | 0060 | |
@> | &circ | SD15 | CIRCUMFLEX ACCENT | 0......XT. | 005E | |
@% | &die | SD17 | DIAERESIS | .12345.XT. | 00A8 | |
@$ | &tilde | SD19 | TILDE | 0......XT. | 007E | |
@* | &caron | SD21 | CARON | ..2.4..XT. | 02C7 | |
@# | &breve | SD23 | BREVE | ..23...XT. | 02D8 | |
@" | &dblac | SD25 | DOUBLE ACUTE ACCENT | ..2....XT. | 02DD | |
@0 | &ring | SD27 | RING ABOVE | .......XT. | 02DA | |
@. | &dot | SD29 | DOT ABOVE | ..234..XT. | 02D9 | |
@= | ¯on | SD31 | MACRON | .1..45.XT. | 00AF | |
_) | ¸ | SD41 | CEDILLA | .12345.XT. | 00B8 | |
_( | &ogon | SD43 | OGONEK | ..2.4..XT. | 02DB | |
|
||||||
## | &num | SM01 | NUMBER SIGN | 0.......TV | 0023 | |
%% | &percnt | SM02 | PERCENT SIGN | 0.......TV | 0025 | |
&& | & | SM03 | AMPERSAND | 0.......TV | 0026 | |
** | &ast | SM04 | ASTERISK | 0.......TV | 002A | |
@@ | &commat | SM05 | COMMERCIAL AT | 0.......TV | 0040 | |
*( | &lsqb | SM06 | LEFT SQUARE BRACKET | 0.......T. | 005B | |
\\ | &bsol | SM07 | REVERSE SOLIDUS | 0......... | 005C | |
*) | &rsqb | SM08 | RIGHT SQUARE BRACKET | 0.......T. | 005D | |
{ | &lcub | SM11 | LEFT CURLY BRACKET | 0......... | 007B | |
_- | &horbar | SM12 | HORIZONTAL BAR | ......AC.V | 2015 | |
| | &verbar | SM13 | VERTICAL LINE | 0.......TV | 007C | |
} | &rcub | SM14 | RIGHT CURLY BRACKET | 0......... | 007D | |
_m | µ | SM17 | MICRO SIGN | .1.3.5.CTV | 00B5 | |
_O | &ohm | SM18 | OHM SIGN | ......ACTV | 2126 | |
@0 | ° | SM19 | DEGREE SIGN | .12345.CTV | 00B0 | |
_o | º | SM20 | MASCULINE ORDINAL INDICATOR | .1...5.CTV | 00BA | |
_a | ª | SM21 | FEMININE ORDINAL INDICATOR | .1...5.CTV | 00AA | |
#S | § | SM24 | SECTION SIGN | .12345.CTV | 00A7 | |
#P | ¶ | SM25 | PILCROW SIGN | .1...5.CTV | 00B6 | |
#. | · | SM26 | MIDDLE DOT | .1.3.5.CTV | 00B7 | |
_< | &larr | SM30 | LEFTWARDS ARROW | ......AC.V | 2190 | |
_> | &rarr | SM31 | RIGHTWARDS ARROW | ......AC.V | 2192 | |
_A | &uarr | SM32 | UPWARDS ARROW | ......AC.V | 2191 | |
_V | &darr | SM33 | DOWNWARDS ARROW | ......AC.V | 2193 | |
#c | © | SM52 | COPYRIGHT SIGN | .1...5.C.. | 00A9 | |
#r | ® | SM53 | REGISTERED SIGN | .1...5.C.. | 00AE | |
#t | &trade | SM54 | TRADE MARK SIGN | ......AC.. | 2122 | |
*| | ¦ | SM65 | BROKEN BAR | .1...5.C.. | 00A6 | |
^ | ¬ | SM66 | NOT SIGN | .1...5.C.. | 00AC | |
_J | &sung | SM93 | MUSIC NOTE (EIGHTH NOTE IN 10646) | ......AC.. | 266A | |
|
||||||
SP | &blank | SP01 | SPACE | 0.......TV | 0020 | |
! | &excl | SP02 | EXCLAMATION MARK | 0.......TV | 0021 | |
*! | ¡ | SP03 | INVERTED EXCLAMATION MARK | .1...5.CTV | 00A1 | |
" | " | SP04 | QUOTATION MARK | 0.......TV | 0022 | |
' | &apos | SP05 | APOSTROPHE | 0.......TV | 0027 | |
( | &lpar | SP06 | LEFT PARENTHESIS | 0.......TV | 0028 | |
) | &rpar | SP07 | RIGHT PARENTHESIS | 0.......TV | 0029 | |
, | &comma | SP08 | COMMA | 0.......TV | 002C | |
__ | &lowbar | SP09 | LOW LINE | 0.......TV | 005F | |
- | &hyphen | SP10 | HYPHEN-MINUS | 0.......TV | 002D | |
. | &period | SP11 | FULL STOP | 0.......TV | 002E | |
// | &sol | SP12 | SOLIDUS | 0.......TV | 002F | |
: | &colon | SP13 | COLON | 0.......TV | 003A | |
; | &semi | SP14 | SEMICOLON | 0.......TV | 003B | |
? | &quest | SP15 | QUESTION MARK | 0.......TV | 003F | |
*? | ¿ | SP16 | INVERTED QUESTION MARK | .1...5.CTV | 00BF | |
*< | « | SP17 | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK | .1...5.CTV | 00AB | |
*> | » | SP18 | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK | .1...5.CTV | 00BB | |
@( | &lsquo | SP19 | LEFT SINGLE QUOTATION MARK | ......AC.V | 2018 | |
@) | &rsquo | SP20 | RIGHT SINGLE QUOTATION MARK | ......AC.V | 2019 | |
@{ | &ldquo | SP21 | LEFT DOUBLE QUOTATION MARK | ......AC.V | 201C | |
@} | &rdquo | SP22 | RIGHT DOUBLE QUOTATION MARK | ......AC.V | 201D | |
|
||||||
NBSP |   | SP31 | NO-BREAK SPACE | .12345.C.. | 00A0 | |
SHY | ­ | SP32 | SOFT HYPHEN | .12345.C.. | 00AD |
ANNEX TO THE ATLANTIC SUBSET (normative)
CHARACTERS CODED IN ISO/IEC 10646
BUT NOT INCLUDED IN THE ATLANTIC SUBSET
VERSION 1.0
1994-10-28, corr. 1998-08-04
J. W. van Wingen
There are several characters included in ISO/IEC 10646-1:1993 for which no justification could be found for inclusion in the Atlantic Subset. Nevertheless, in certain fields of application, too restricted to be of general interest, characters are used for which ISO/IEC 10646 specifies a code. To serve users who do not want to resort to the complete International Standard for information on a few additional characters only, a selection has been made. To present a good view on the matter sometimes letters from the Atlantic Subset itself are being added to a list. In order to provide a facility for referencing from other European or National standards, for those subrepertoires from ISO/IEC 10646-1 that are identified as suitable, a name is specified. This name, the set it covers, and the coding of the characters, are the only elements in this Annex that are normative.
For indicating stress on syllables, or a difference in pronunciation,
an ACUTE ACCENT may be applied to characters for vowels. They have been
officially classified as being not essential to the language and only rarely
used. The following additional characters require specification of coding.
Transformation to ASCII | SGML public entries | Short identifiers (SID) | Letters name | Tables in ISO 10367 | Code in ISO 10646 |
LATIN SMALL LIGATURE A E WITH ACUTE | .......... | 01FD | |||
LATIN CAPITAL LIGATURE A E WITH ACUTE | .......... | 01FC | |||
LATIN SMALL LETTER A WITH RING AND ACUTE | .......... | 01FF | |||
LATIN CAPITAL LETTER A WITH RING AND ACUTE | .......... | 01FE | |||
LATIN SMALL LETTER O WITH STROKE AND ACUTE | .......... | 01FA | |||
LATIN CAPITAL LETTER O WITH STROKE AND ACUTE | .......... | 01FB |
Named subrepertoire: ADDITIONAL DANISH
Up to about 1940 a special font was used for printing Irish. This included
a number of consonants WITH DOT ABOVE. These are:
Transformation to ASCII | SGML public entries | Short identifiers (SID) | Letters name | Tables in ISO 10367 | Code in ISO 10646 |
@b | &bdot | LB29 | LATIN SMALL LETTER B WITH DOT ABOVE | .......... | 1E03 |
@B | &Bdot | LB30 | LATIN CAPITAL LETTER B WITH DOT ABOVE | .......... | 1E02 |
@c | &cdot | LC29 | LATIN SMALL LETTER C WITH DOT ABOVE | .......... | 010B |
@C | &Cdot | LC30 | LATIN CAPITAL LETTER C WITH DOT ABOVE | .......... | 010A |
@d | &ddot | LD29 | LATIN SMALL LETTER D WITH DOT ABOVE | .......... | 1E0B |
@D | &Ddot | LD30 | LATIN CAPITAL LETTER D WITH DOT ABOVE | .......... | 1E0A |
@f | &fdot | LF29 | LATIN SMALL LETTER F WITH DOT ABOVE | .......... | 1E1F |
@F | &Fdot | LF30 | LATIN CAPITAL LETTER F WITH DOT ABOVE | .......... | 1E1E |
@g | &gdot | LG29 | LATIN SMALL LETTER G WITH DOT ABOVE | .......... | 0121 |
@G | &Gdot | LG30 | LATIN CAPITAL LETTER G WITH DOT ABOVE | .......... | 0120 |
@m | &mdot | LM29 | LATIN SMALL LETTER M WITH DOT ABOVE | .......... | 1E41 |
@M | &Mdot | LM30 | LATIN CAPITAL LETTER M WITH DOT ABOVE | .......... | 1E40 |
@P | &Pdot | LP30 | LATIN CAPITAL LETTER P WITH DOT ABOVE | .......... | 1E56 |
@p | &pdot | LP29 | LATIN SMALL LETTER P WITH DOT ABOVE | .......... | 1E57 |
@s | &sdot | LS29 | LATIN SMALL LETTER S WITH DOT ABOVE | .......... | 1E61 |
@S | &Sdot | LS30 | LATIN CAPITAL LETTER S WITH DOT ABOVE | .......... | 1E60 |
@t | &tdot | LT29 | LATIN SMALL LETTER T WITH DOT ABOVE | .......... | 1E6B |
@T | &Tdot | LT30 | LATIN CAPITAL LETTER T WITH DOT ABOVE | .......... | 1E6A |
Named subrepertoire: ADDITIONAL IRISH
Welsh has w and y for vowels as well. On all vowels the ACUTE, GRAVE, CIRCUMFLEX and DIAERESIS may appear. Not all combinations for w and y were included in the repertoire of ISO/IEC 6937:1994, but these are part of this Subset. For information those in this Subset, but not in ISO/IEC 6937, are indicated with a * (star).
Alphabetically sorted to SID
Transformation to ASCII | SGML public entries | Short identifiers (SID) | Letters name | Tables in ISO 10367 | Code in ISO 10646 |
w | LW01 | LATIN SMALL LETTER W | 0.......TV | 0177 | |
W | LW02 | LATIN CAPITAL LETTER W | 0.......TV | 0057 | |
/w | LW11 | LATIN SMALL LETTER W WITH ACUTE | * .......... | 1E83 | |
/W | LW12 | LATIN CAPITAL LETTER W WITH ACUTE | * .......... | 1E82 | |
\w | LW13 | LATIN SMALL LETTER W WITH GRAVE | * .......... | 1E81 | |
\W | LW14 | LATIN CAPITAL LETTER W WITH GRAVE | * .......... | 1E80 | |
>w | LW15 | LATIN SMALL LETTER W WITH CIRCUMFLEX | ......AX | 0175 | |
>W | LW16 | LATIN CAPITAL LETTER W WITH CIRCUMFLEX | ......AXTV | 0174 | |
%w | LW17 | LATIN SMALL LETTER W WITH DIAERESIS | * .......... | 1E85 | |
%W | LW18 | LATIN CAPITAL LETTER W WITH DIAERESIS | * .......... | 1E84 | |
y | LY01 | LATIN SMALL LETTER Y | 0.......TV | 0077 | |
Y | LY02 | LATIN CAPITAL LETTER Y | 0.......TV | 0057 | |
/y | LY11 | LATIN SMALL LETTER Y WITH ACUTE | .12...AXTV | 00FD | |
/Y | LY12 | LATIN CAPITAL LETTER Y WITH ACUTE | .12...AXTV | 00DD | |
\y | LY13 | LATIN SMALL LETTER Y WITH GRAVE | * .......... | 1EF3 | |
\Y | LY14 | LATIN CAPITAL LETTER Y WITH GRAVE | * .......... | 1EF2 | |
>y | LY15 | LATIN SMALL LETTER Y WITH CIRCUMFLEX | ......AXTV | 0177 | |
>Y | LY16 | LATIN CAPITAL LETTER Y WITH CIRCUMFLEX | ......AXTV | 0176 | |
%y | LY17 | LATIN SMALL LETTER Y WITH DIAERESIS | .1...5.XTV | 00FF | |
%Y | LY18 | LATIN CAPITAL LETTER Y WITH DIAERESIS | ......AXTV | 0178 |
The letters required for writing the Sami language are included in this Subset, but not those supposed to be needed for the Skolt Sami dialect, spoken by some 500 people only. The letters are specified in Annex A (informative) of ISO 8859-10, registered as ISO-IR 158.
For this set a coding in the BMP was assumed to exist. However, it appeared that not all of these characters were included in the BMP repertoire, or possibly under a different name. In this confused situation information about the two-octet coding of these characters could not be listed here.
A.5 Letters in instruction books
In some books for teaching a language, Latin in particular, use is made
of the MACRON and BREVE on the letters a e i o u, indicating long or short
vowels. Of these the following are not included in this Subset, but in
the BMP they have a code assigned to them.
Transformation to ASCII | SGML public entries | Short identifiers (SID) | Letters name | Tables in ISO 10367 | Code in ISO 10646 |
#e | &ebreve | LE23 | LATIN SMALL LETTER E WITH BREVE | .......... | 0115 |
#E | &Ebreve | LE24 | LATIN CAPITAL LETTER E WITH BREVE | .......... | 0114 |
#i | &ibreve | LI23 | LATIN SMALL LETTER I WITH BREVE | .......... | 012D |
#I | &Ibreve | LI24 | LATIN CAPITAL LETTER I WITH BREVE | .......... | 012C |
#o | &obreve | LO23 | LATIN SMALL LETTER O WITH BREVE | .......... | 014F |
#O | &Obreve | LO24 | LATIN CAPITAL LETTER O WITH BREVE | .......... | 014E |
Named subrepertoire: ADDITIONAL LATIN
Some letters included in ISO standards earlier than ISO/IEC 10646 have become obsolete. It is now recognized that one letter for Afrikaans and two for Catalan never existed at all. These letters are deprecated now in ISO/IEC 6937:1994. Five letters for Greenlandic disappeared from its orthography in 1973. All these eight letters are still in the repertoire of the UCS for compatibility with the past. The full name and the code are given at the start of Table 1.
A.7 Vietnamese
Contrary to the writing systems for many of the languages used in its neighbouring countries, Vietnamese is written with Latin script. This situation, and the presence of groups of people from Vietnam living in Europe, makes it desirable to include information on the Vietnamese repertoire and its coding in the BMP.
There is only a single additional consonant letter, but besides the usual five letters for vowels, another seven are needed. Furthermore, the language has six tones on which the meaning of a word strongly depends, and which have to be indicated. The letters F, f, J, j, W, w, Z, z are not needed. As a consequence, the Vietnamese repertoire consists of 178 letters, of which 134 do not occur in LATIN-1. For reasons of space the list of letters with their full names, and their coding in UCS-2 is not included here.
Named subrepertoire: ADDITIONAL VIETNAMESE
Several scripts are being transliterated into the Latin script with the help of special letters, mostly those carrying a macron or a dot above or below. Some of these are specified in ISO TC46 standards, others are used in scientific publications under general agreement. Where these only have one case of letters, the need for capital forms is doubtful in Latin transliteration, and their use arbitrary. In ISO/IEC 10646-1 coding is provided for transliterated Chinese, Arabic and Sanskrit. Giving further details is beyond the scope of this publication.
Several Personal Computers use code tables of their own, with special characters that do not occur in ISO/IEC 10367. Nevertheless, many are included in the BMP. Because it is difficult to identify these with those in manufacturers tables no named subrepertoire is indicated.