17. INPUT AND OUTPUT OF CHARACTERS

To enable a computer the processing of characters, these have to be entered into the system, to inspect the result, they have to be displayed. Both aspects deserve some attention, even where the matter is outside the scope of standards for coded character sets.

It is not true, as is sometimes suggested, that input and output are simply each others reverse. The requirements for the hardware are anyway completely different at either side. For input of characters the use of a "keyboard" is still the most important method, for output a "printer" is applied, or a visual display screen.

Keyboards

The majority of input systems are based on a keyboard, historically a descendent of a typewriter. Pressing a key results in dispatching a code, formerly to be fixed in a stretch of paper tape or in a punched card. In modern hardware these codes are recorded in a magnetic medium. On keyboards for Latin script the normal Latin letters are available, but not special letters from the North or letters with accents in general. The ISO standards in this field have been totally revised recently (ISO 9995, in 8 parts).

The current Netherlands standard NEN 2294 (1986) is still based on the old ISO standards (1090, 1091, 2126, 2530, 3243, 3244, 4169), and has thus to be revised in due course. The adaptation to ISO 9995 is not expected to involve technical changes in the layout and denomination of keys (see the Annexes). A scheme for a keyboard supporting the ISO 6937 repertoire, based on ISO 9995-3 would merit incorporation into the new NEN standard.

Displaying characters

Just like entering characters is modelled on typewriter practices, the display used to be. But real typography had a tradition of ages, and with new facilities becoming available, most people are no longer satisfied with printouts of text in a style they think outdated. To create those nicer presentations from coded characters, however, is no mean problem, because the information how the output should look like is to be coded in some way.

A text is not an unstructured sequence of characters. Even in the simple case of typewriter output, it consists of "lines", each composed of characters (and spaces) up to a certain maximum. It is a good approach to start from discussing that case, and to proceed with more complicated environments. The typewriter prints one character a time, as a result of key striking or from reading a medium. It continues up to the end of the line, which it has to be told where it occurs. Unless the number of positions on a line is always fixed, where to continue on the next line needs to be given by some code. The actions to be taken on the typewriter are traditionally LINE FEED and CARRIAGE RETURN, in any order. The solution chosen is usually to represent these two by "control characters", to be coded as such, and to be interpreted as such wherever they occur.

With newer developments in hardware and software the typewriter or line printer model looked more and more antiquate. A closer approach to the facilities of real typography seemed within reach. A larger variety of letter shapes, like that already presented by the famous detachable daisy wheel, or later a little ball, was offered by the newer line printers. According to a common style, letters were grouped in a "font", that could be selected by software. The characters themselves remained what they were, only the way they were displayed was open to change.

The next step was to allow for a font containing letters of varying width. In typography a "m" is wider than an "i", but a normal typewriter could not be built showing this feature. Tables would collapse with these fonts, but a TAB control character could cope with correct layout.

If, however, it is wanted to have a change from one font to another in the middle of a text, or letters of a larger size for headings, or to use "highlighting", by means of "italic" or "bold" characters, the points where the change has to occur must be marked as such in the text itself. Which method to use to that purpose is now the question. One way out could be to apply control characters. But too few simple control characters are available, and control functions represented by a control sequence of bytes, according to a certain syntax, are a possibility. These things tend to be complicated, and thus best to be generated by software. They also cannot be changed easily by hand, and thus a un-revisable text is created as soon as the text is "formatted".

Apart from control characters, texts still contained up to now as many graphic characters as there were codes, septets or octets. But typography requires more, and here the paths separate. If further processing of text is wanted, like counting of letters or words, no additional elements, meant for display only, can be allowed. We are forced to distinguish two types of texts, those classified as "revisable" and those as "final form" (or "processable" and "formatted" with ODA, ISO 8613).

As has been said, if one wants to instruct the printer that it change from one font to another, that change has to be marked in the text itself. But these codes could be inserted by writing "markers" at the right place with normal graphic characters, and have the "marked-up" text processed by a program that produces the control sequences. This method is also suitable to indicate paragraphs, indentation, or page transfer, what usually is called "formatting". If the printer just prints whole pages, this is obviously the right term.

Another level of abstraction now comes into sight. If we define some kind of syntax between the markers, we may be able to use them as a description of the structure of the text, independent of the actual printer. Then we need a higher level formatter to produce executable markers suitable to the printer selected. This is the approach taken by SGML (Standard Generalized Mark-up Language). Text marked up by SGML is freely exchangeable between very different hardware and software, and can be printed after processing by a local "formatter". Such a program that arranges the "boxes" containing letters, and the areas of blank space over the pages, to mention the main features, could be based on DSSSL, Document Style and Semantics Specification Language, (ISO/IEC 10179), and could generate statements in "Postscript", or in SPDL, its successor, (Standard Page Description Language, ISO/IEC 10180).

It should be pointed out that what is described here is based on systems implementing international standards, where every detail is documented to enable inspection by the interested user, in short they are "open". Many "word processors" that are around practicise the coding of texts in a way that is quite specific, and the user is not expected to look inside. Texts thus coded can only be exchanged between processors of the same brand, and a special converter is required, if that is not the case, and even then, not always complete success is achieved.