4.5 Handling errors
The
recode program, while using the Recode
library, needs to control whether recoding problems
are reported or not, and then reflect these in the
exit status. The program should also instruct the
library whether the recoding should be abruptly
interrupted when an error is met (so sparing
processing when it is known in advance that a wrong
result would be discarded anyway), or if it should
proceed nevertheless. Here is how the library
groups errors into levels, listed here in order of
increasing severity.
RECODE_NO_ERROR
- No
error was met on previous library calls.
RECODE_NOT_CANONICAL
-
The input text was using one of the many
alternative codings for some phenomenon, but
not the one Recode would have canonically
generated. So, if the reverse recoding is later
attempted, it would produce a text having the
same meaning as the original text, yet
not being byte identical.
For example, a Base64 block in
which end-of-lines appear elsewhere that at
every 76 characters is not canonical. An
e-circumflex in TeX which is coded as
‘\^{e}’
instead of ‘\^e’ is not canonical.
RECODE_AMBIGUOUS_OUTPUT
-
It has been discovered that if the reverse
recoding was attempted on the text output by
this recoding, we would not obtain the original
text, only because an ambiguity was generated
by accident in the output text. This ambiguity
would then cause the wrong interpretation to be
taken.
Here are a few examples. If the
Latin-1 sequence
‘e^’ is
converted to Easy French and back, the result
will be interpreted as e-circumflex and so,
will not reflect the intent of the original two
characters. Recoding an IBM-PC
text to Latin-1 and back, where
the input text contained an isolated
LF, will have a spurious
CR inserted before the
LF.
Currently, there are many cases in the
library where the production of ambiguous
output is not properly detected, as it is
sometimes a difficult problem to accomplish
this detection, or to do it speedily.
RECODE_UNTRANSLATABLE
-
One or more input character could not be
recoded, because there is just no
representation for this character in the output
charset.
Here are a few examples. Non-strict mode
often allows Recode to compute on-the-fly
mappings for unrepresentable characters, but
strict mode prohibits such attribution of
reversible translations: so strict mode might
often trigger such an error. Most
UCS-2 codes used to represent
Asian characters cannot be expressed in various
Latin charsets.
RECODE_INVALID_INPUT
-
The input text does not comply with the coding
it is declared to hold. So, there is no way by
which a reverse recoding would reproduce this
text, because Recode should never produce
invalid output.
Here are a few examples. In strict mode,
ASCII text is not allowed to
contain characters with the eight bit set.
UTF-8 encodings ought to be
minimal1.
RECODE_SYSTEM_ERROR
-
The underlying system reported an error while the
recoding was going on, likely an input/output
error. (This error symbol is currently unused in
the library.)
RECODE_USER_ERROR
-
The programmer or user requested something the
recoding library is unable to provide, or used
the API wrongly. (This error symbol is currently
unused in the library.)
RECODE_INTERNAL_ERROR
-
Something really wrong, which should normally
never happen, was detected within the recoding
library. This might be due to genuine bugs in the
library, or maybe due to un-initialised or
overwritten arguments to the API. (This error
symbol is currently unused in the library.)
RECODE_MAXIMUM_ERROR
-
This error code should never be returned, it is
only internally used as a sentinel for the list
of all possible error codes.
One
should be able to set the error level threshold for
returning failure at end of recoding, and also the
threshold for immediate interruption. If many
errors occur while the recoding proceed, which are
not severe enough to interrupt the recoding, then
the most severe error is retained, while others are
forgotten2. So, in
case of an error, the possible actions currently
are:
- do nothing and let go, returning success at
end of recoding,
- just let go for now, but return failure at
end of recoding,
- interrupt recoding right away and return
failure now.
See Task
level, and particularly the description of the
fields fail_level,
abort_level and
error_so_far, for more information
about how errors are handled.
|