3.9 Debugging considerations
It is our
experience that when Recode does not provide
satisfying results, either the recode
program was not called properly, correct results
raised some doubts nevertheless, or files to recode
were somewhat mangled. Genuine bugs are surely
possible.
Unless you
already are a Recode expert, it might be a good
idea to quickly revisit the tutorial (see Tutorial) or
the prior sections in this chapter, to make sure
that you properly formatted your recoding request.
In the case you intended to use Recode as a filter,
make sure that you did not forget to redirect your
standard input (through using the <
symbol in the shell, say). Some Recode false
mysteries are also easily explained, See Reversibility.
For the
other cases, some investigation is needed. To
illustrate how to proceed, let's presume that you
want to recode the nicepage file, coded
UTF-8, into HTML. The
problem is that the command ‘recode u8..h nicepage’
yields:
recode: Invalid input in step `UTF-8..ISO-10646-UCS-2'
One good
trick is to use recode in filter mode
instead of in file replacement mode, See Synopsis.
Another good trick is to use the
‘-v’ option
asking for a verbose description of the recoding
steps. We could rewrite our recoding call as
‘recode -v u8..h
<nicepage’, to get something
like:
Request: UTF-8..:iconv:..ISO-10646-UCS-2..HTML_4.0
Shrunk to: UTF-8..ISO-10646-UCS-2..HTML_4.0
[...some output...]
recode: Invalid input in step `UTF-8..ISO-10646-UCS-2'
This might
help you to better understand what the diagnostic
means. The recoding request is achieved in two
steps, the first recodes UTF-8 into
UCS-2, the second recodes
UCS-2 into HTML. The
problem occurs within the first of these two steps,
and since, the input of this step is the input file
given to Recode, this is this overall input file
which seems to be invalid. Also, when used in
filter mode, Recode processes as much input as
possible before the error occurs and sends the
result of this processing to standard output. Since
the standard output has not been redirected to a
file, it is merely displayed on the user screen. By
inspecting near the end of the resulting
HTML output, that is, what was
recoding a bit before the recoding was interrupted,
you may infer about where the error stands in the
real UTF-8 input file.
If you have
the proper tools to examine the intermediate
recoding data, you might also prefer to reduce the
problem to a single step to better study it. This
is what I usually do. For example, the last
recode call above is more or less
equivalent to:
recode -v UTF-8..ISO_10646-UCS-2 <nicepage >temporary
recode -v ISO_10646-UCS-2..HTML_4.0 <temporary
rm temporary
If you know
that the problem is within the first step, you
might prefer to concentrate on using the first
recode line. If you know that the
problem is within the second step, you might
execute the first recode line once and
for all, and then play with the second
recode call, repeatedly using the
temporary
file created once by the first call.
Note that
the ‘-f’
switch may be used to force the production of
HTML output despite invalid input, it
might be satisfying enough for you, and easier than
repairing the input file. That depends on how
strict you would like to be about the precision of
the recoding process.
If you
later see that your HTML file begins with
‘@lt;html@gt;’ when you
expected ‘<html>’, then Recode
might have done a bit more that you wanted. In this
case, your input file was half-UTF-8,
half-HTML already, that is, a mixed
file (see Mixed). There is a
special -d switch for this case. So,
your might be end up calling ‘recode -fd nicepage’. Until
you are quite sure that you accept overwriting your
input file whatever what, I recommend that you
stick with filter mode.
If, after
such experiments, you seriously think that Recode
does not behave properly, there might be a genuine
bug either in the program or the library itself, in
which case I invite you to to contribute a bug
report, See Contributing.
|