3.5 Reversibility issues
The
following options are somewhat related to
reversibility issues:
- ‘-f’
- ‘--force’
-
With this
option, irreversible or otherwise erroneous
recodings are run to completion, and
recode does not exit with a
non-zero status if it would be only because
irreversibility matters. See Reversibility.
Without this option, Recode tries to protect
you against recoding a file irreversibly over
itself1.
Whenever an irreversible recoding is met, or
any other recoding error, recode
produces a warning on standard error. The
current input file does not get replaced by its
recoded version, and recode then
proceeds with the recoding of the next
file.
When the program is merely used as a filter,
standard output will have received a partially
recoded copy of standard input, up to the first
error point. After all recodings have been done
or attempted, and if some recoding has been
aborted, recode exits with a
non-zero status.
In releases of Recode prior to version 3.5,
this option was always selected, so it was
rather meaningless. Nevertheless, users were
invited to start using ‘-f’ right away in scripts
calling Recode whenever convenient, in
preparation for the current behaviour.
- ‘-q’
- ‘--quiet’
- ‘--silent’
-
This option
has the sole purpose of inhibiting warning
messages about irreversible recodings, and
other such diagnostics. It has no other effect,
in particular, it does not prevent
recodings to be aborted or
recode
to return a non-zero exit status when
irreversible recodings are met.
This option is set automatically for the
children processes, when recode splits itself
in many collaborating copies. Doing so, the
diagnostic is issued only once by the parent.
See option ‘-p’.
- ‘-s’
- ‘--strict’
-
By using
this option, the user requests that Recode be
very strict while recoding a file, merely
losing in the transformation any character
which is not explicitly mapped from a charset
to another. Such a loss is not reversible and
so, will bring Recode to fail, unless the
option ‘-f’ is also given as a
kind of counter-measure.
Using ‘-s’ without
‘-f’
might render Recode very susceptible to the
slighest file abnormalities. Despite the fact
that it might be irritating to some users, such
paranoia is sometimes wanted and useful.
Even if
Recode tries hard to keep the recodings reversible,
you should not develop an unconditional confidence
in its ability to do so. You ought to keep
only reasonable expectations about reverse
recodings. In particular, consider:
- Most transformations are fully reversible for
all inputs, but lose this property whenever
‘-s’ is
specified.
- A few transformations are not meant to be
reversible, by design.
- Reversibility sometimes depends on actual
file contents and cannot be ascertained
beforehand, without reading the file.
- Reversibility is never absolute across
successive versions of this program. Even
correcting a small bug in a mapping could induce
slight discrepancies later.
- Reversibility is easily lost by merging. This
is best explained through an example. If you
reversibly recode a file from charset
A to charset B, then you
reversibly recode the result from charset
B to charset C, you cannot
expect to recover the original file by merely
recoding from charset C directly to
charset A. You will instead have to
recode from charset C back to charset
B, and only then from charset
B to charset A.
- Faulty files create a particular problem.
Consider an example, recoding from
IBM-PC to Latin-1. End
of lines are represented as ‘\r\n’ in IBM-PC
and as ‘\n’ in Latin-1.
There is no way by which a faulty
IBM-PC file containing a
‘\n’ not
preceded by ‘\r’ be translated into a
Latin-1 file, and then back.
- There is another difficulty arising from code
equivalences. For example, in a
LaTeX charset file, the string
‘\^\i{}’
could be recoded back and forth through another
charset and become ‘\^{\i}’. Even if the
resulting file is equivalent to the original one,
it is not identical.
Unless option
‘-s’ is
used, Recode automatically tries to fill mappings
with invented correspondences, often making them
fully reversible. This filling is not made at
random. The algorithm tries to stick to the
identity mapping and, when this is not possible, it
prefers generating many small permutation cycles,
each involving only a few codes.
For
example, here is how IBM-PC code 186
gets translated to control-U in
Latin-1. Control-U is 21.
Code 21 is the IBM-PC section sign,
which is 167 in Latin-1. Recode cannot
reciprocate 167 to 21, because 167 is the masculine
ordinal indicator within IBM-PC, which
is 186 in Latin-1. Code 186 within
IBM-PC has no Latin-1
equivalent; by assigning it back to 21, Recode
closes this short permutation loop.
As a
consequence of this map filling, Recode may
sometimes produce funny characters. They
may look annoying, they are nevertheless helpful
when one changes his (her) mind and wants to revert
to the prior recoding. If you cannot stand these,
use option ‘-s’, which asks for a very
strict recoding.
This map
filling sometimes has a few surprising
consequences, which some users wrongly interpreted
as bugs. Here are two examples.
- In some cases, Recode seems to copy a file
without recoding it. But in fact, it does.
Consider a request:
recode l1..us < File-Latin1 > File-ASCII
cmp File-Latin1 File-ASCII
then cmp will
not report any difference. This is quite
normal. Latin-1
gets correctly recoded to ASCII for charsets
commonalities (which are the first 128
characters, in this case). The remaining last
128 Latin-1 characters
have no ASCII correspondent. Instead of losing
them, Recode elects to map them to unspecified
characters of ASCII, so making the recoding
reversible. The simplest way of achieving this
is merely to keep those last 128 characters
unchanged. The overall effect is copying the
file verbatim.
If you feel this behaviour is too generous
and if you do not wish to care about
reversibility, simply use option
‘-s’. By
doing so, Recode will strictly map only those
Latin-1
characters which have an ASCII equivalent, and
will merely drop those which do not. Then,
there is more chance that you will observe a
difference between the input and the output
file.
- Recoding the wrong way could sometimes give
the false impression that recoding has
almost been done properly. Consider the
requests:
recode 437..l1 < File-Latin1 > Temp1
recode 437..l1 < Temp1 > Temp2
so declaring wrongly
File-Latin1 to be an
IBM-PC file, and recoding to
Latin-1. This is surely ill
defined and not meaningful. Yet, if you repeat
this step a second time, you might notice that
many (not all) characters in Temp2 are identical to
those in File-Latin1. Sometimes,
people try to discover how Recode works by
experimenting a little at random, rather than
reading and understanding the documentation;
results such as this are surely confusing, as
they provide those people with a false feeling
that they understood something.
Reversible codings have this property that,
if applied several times in the same direction,
they will eventually bring any character back
to its original value. Since Recode seeks small
permutation cycles when creating reversible
codings, besides characters unchanged by the
recoding, most permutation cycles will be of
length 2, and fewer of length 3, etc. So, it is
just expectable that applying the recoding
twice in the same direction will recover most
characters, but will fail to recover those
participating in permutation cycles of length
3. On the other end, recoding six times in the
same direction would recover all characters in
cycles of length 1, 2, 3 or 6.
|