13.2 Representation for end of lines
The same
charset might slightly differ, from one system to
another, for the single fact that end of lines are
not represented identically on all systems. The
representation for an end of line within Recode is
the ASCII or UCS code
with value 10, or LF. Other conventions
for representing end of lines are available through
surfaces.
CR
-
This
convention is popular on Apple's Macintosh
machines. When this surface is applied, each
line is terminated by CR, which has
ASCII value 13. Unless the library
is operating in strict mode, adding or removing
the surface will in fact exchange
CR and LF, for better
reversibility. However, in strict mode, the
exchange does not happen, any CR
will be copied verbatim while applying the
surface, and any LF will be copied
verbatim while removing it.
This surface is available in Recode under
the name CR, it does not have any
aliases. This is the implied surface for the
Apple Macintosh related charsets.
CR-LF
-
This convention is popular on Microsoft
systems running on IBM PCs and compatible. When
this surface is applied, each line is
terminated by a sequence of two characters: one
CR followed by one LF, in
that order.
For
compatibility with oldish MS-DOS systems,
removing a CR-LF surface will
discard the first encountered C-z,
which has ASCII value 26, and
everything following it in the text. Adding
this surface will not, however, append a
C-z to the result.
This surface is available in
Recode under the name CR-LF and
has cl for an alias. This is the
implied surface for the IBM or Microsoft
related charsets or code pages.
Some other
charsets might have their own representation for an
end of line, which is different from LF.
For example, this is the case of various
EBCDIC charsets, or
Icon-QNX. The recoding of end of lines
is intimately tied into such charsets, it is not
available separately as surfaces.
|