5.7 Fully interpreted UCS dump
Another
device may be used to get fully interpreted dumps
of an UCS-2 stream of characters, with
one UCS-2 character displayed on a
full output line. Each line receives the RFC 1345
mnemonic for the character if it
exists, the UCS-2 value of the
character, and a descriptive comment for that
character. As each input character produces its own
output line, beware that the output file from this
conversion may be much, much bigger than the input
file.
This
charset is available in Recode under the name
dump-with-names.
This
dump-with-names feature has been
implemented as a charset rather than a surface.
This is surely debatable. The current
implementation allows for dumping charsets other
than UCS-2. For example, the command
‘recode l2..full < input’
implies a necessary conversion from
Latin-2 to UCS-2, as
dump-with-names is only connected out
from UCS-2. In such cases, Recode does
not display the original Latin-2 codes
in the dump, only the corresponding
UCS-2 values. To give a simpler
example, the command
echo 'Hello, world!' | recode us..dump
produces the following output:
UCS2 Mne Description
0048 H latin capital letter h
0065 e latin small letter e
006C l latin small letter l
006C l latin small letter l
006F o latin small letter o
002C , comma
0020 SP space
0077 w latin small letter w
006F o latin small letter o
0072 r latin small letter r
006C l latin small letter l
0064 d latin small letter d
0021 ! exclamation mark
000A LF line feed (lf)
The
descriptive comment is given in English and
ASCII, yet if the English description
is not available but a French one is, then the
French description is given instead, using
Latin-1. However, if the
LANGUAGE or LANG
environment variable begins with the letters
‘fr’, then
listing preference goes to French when both
descriptions are available.
Here is
another example. To get the long description of the
code 237 in Latin-5 table, one may use
the following command.
echo -n 237 | recode l5/d..dump
If your echo does not grok
‘-n’, use
‘echo 237\c’
instead. Here is how to see what Unicode
U+03C6 means, while getting rid of the
title lines.
echo -n 0x03C6 | recode u2/x2..dump | tail +3
|