4.2 Request level functions
The request
level functions are meant to cover most recoding
needs programmers may have; they should provide all
usual functionality. Their API is almost stable by
now.
To get
started with request level functions, here is a
full example of a program which sole job is to
filter ibmpc code on its standard
input into latin1 code on its standard
output.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <recode.h>
const char *program_name;
int
main (int argc, char *const *argv)
{
program_name = argv[0];
RECODE_OUTER outer = recode_new_outer (true);
RECODE_REQUEST request = recode_new_request (outer);
bool success;
recode_scan_request (request, "ibmpc..latin1");
success = recode_file_to_file (request, stdin, stdout);
recode_delete_request (request);
recode_delete_outer (outer);
exit (success ? EXIT_SUCCESS : EXIT_FAILURE);
}
The
header file recode.h declares a
RECODE_REQUEST structure, which the
programmer should use for allocating a variable in
his program. This request variable is
given as a first argument to all request level
functions, and in most cases, may be considered as
opaque.
Suppose an
application is doing a lot of recoding using only a
few different requests. For speed considerations,
the RECODE_REQUEST structure should
ideally be cached for each kind of request, so the
request level initialisation is not redone for each
and every string translated. The speedup should be
more apparent when Recode is able to optimize the
work by building on the fly, within the structure,
new specialized recoding steps and their associated
data tables.
- Initialisation functions
RECODE_REQUEST recode_new_request (outer);
bool recode_delete_request (request);
No
request variable may not be used in
other request level functions of the recoding
library before having been initialised by
recode_new_request. There may be
many such request variables, in
which case, they are independent of one another
and they all need to be initialised separately.
To avoid memory leaks, a request
variable should not be initialised a second
time without calling
recode_delete_request to
“un-initialise” it.
Like for recode_delete_outer,
calling recode_delete_request
prior to program termination, in the example
above, may be left out.
- Fields of
struct recode_request
Here are the fields of a struct
recode_request which may be meaningfully
changed, once a request has been
initialised by recode_new_request,
but before it gets used. It is not very
frequent, in practice, that these fields need
to be changed. To access the fields, you need
to include recodext.h
instead of recode.h, in which case
there also is a greater chance that you need to
recompile your programs if a new version of the
recoding library gets installed.
verbose_flag
- This field
is initially
false. When set to
true, the library will echo to
stderr the sequence of elementary recoding
steps needed to achieve the requested
recoding.
diaeresis_char
- This
field is initially the ASCII value of a
double quote ", but it may also be
the ASCII value of a colon :. In
texte charset, some countries
use double quotes to mark diaeresis, while
other countries prefer colons. This field
contains the diaeresis character for the
texte charset.
make_header_flag
- This
field is initially
false. When
set to true, it indicates that
the program is merely trying to produce a
recoding table in source form rather than
completing any actual recoding. In such a
case, the optimisation of step sequence can
be attempted much more aggressively. If the
step sequence cannot be reduced to a single
step, table production will fail.
diacritics_only
- This
field is initially
false. For
HTML and LaTeX
charset, it is often convenient to recode the
diacriticized characters only, while just not
recoding other HTML code using ampersands or
angular brackets, or LaTeX code using
backslashes. Set the field to
true for getting this behaviour.
In the other charset, one can edit text as
well as HTML or LaTeX directives.
ascii_graphics
- This
field is initially
false, and
relate to characters 176 to 223 in the
ibmpc charset, which are use to
draw boxes. When set to true,
while getting out of ibmpc,
ASCII characters are selected so to
graphically approximate these boxes.
- Study of request strings
bool recode_scan_request (request, "string");
The
main role of a request variable is
to describe a set of recoding transformations.
Function recode_scan_request
studies the given string, and stores
an internal representation of it into
request. Note that string
may be a full-fledged Recode request, possibly
including surfaces specifications, intermediary
charsets, sequences, aliases or abbreviations
(see Requests).
The internal representation automatically
receives some pre-conditioning and
optimisation, so the request may
then later be used many times to achieve many
actual recodings. It would not be efficient
calling recode_scan_request many
times with the same string, it is
better having many request variables
instead.
- Actual recoding jobs
Once the request variable holds
the description of a recoding transformation, a
few functions use it for achieving an actual
recoding. Either input or output of a recoding
may be string, an in-memory buffer, or a
file.
Functions with names like
recode_input-type_to_output-type
request an actual recoding, and are described
below. It is easy to remember which arguments
each function accepts, once grasped some simple
principles for each possible type.
However, one of the recoding function escapes
these principles and is discussed separately,
first.
recode_string (request, string);
The function
recode_string recodes
string according to
request, and directly returns the
resulting recoded string freshly allocated, or
NULL if the recoding could not
succeed for some reason. When this function is
used, it is the responsibility of the
programmer to ensure that the memory used by
the returned string is later reclaimed.
char *recode_string_to_buffer (request,
input_string,
&output_buffer, &output_length, &output_allocated);
bool recode_string_to_file (request,
input_file,
output_file);
bool recode_buffer_to_buffer (request,
input_buffer, input_length,
&output_buffer, &output_length, &output_allocated);
bool recode_buffer_to_file (request,
input_buffer, input_length,
output_file);
bool recode_file_to_buffer (request,
input_file,
&output_buffer, &output_length, &output_allocated);
bool recode_file_to_file (request,
input_file,
output_file);
All these functions return a
bool result, false
meaning that the recoding was not successful,
often because of reversibility issues. The name
of the function well indicates on which types
it reads and which type it produces. Let's
discuss these three types in turn.
- string
-
A string is merely an in-memory buffer
which is terminated by a
NUL
character (using as many bytes as needed),
instead of being described by a byte
length. For input, a pointer to the buffer
is given through one argument.
It is notable that there is no
to_string functions. Only one
function recodes into a string, and it is
recode_string, which has
already been discussed separately,
above.
- buffer
-
A buffer is a sequence of bytes held in
computer memory. For input, two arguments
provide a pointer to the start of the
buffer and its byte size. Note that for
charsets using many bytes per character,
the size is given in bytes, not in
characters.
For output, three arguments provide the
address of three variables, which will
receive the buffer pointer, the used buffer
size in bytes, and the allocated buffer
size in bytes. If at the time of the call,
the buffer pointer is NULL,
then the allocated buffer size should also
be zero, and the buffer will be allocated
afresh by the recoding functions. However,
if the buffer pointer is not
NULL, it should be already
allocated, the allocated buffer size then
gives its size. If the allocated size gets
exceeded while the recoding goes, the
buffer will be automatically reallocated
bigger, probably elsewhere, and the
allocated buffer size will be adjusted
accordingly.
The second variable, giving the
in-memory buffer size, will receive the
exact byte size which was needed for the
recoding. A NUL character is
guaranteed at the end of the produced
buffer, but is not counted in the byte size
of the recoding. Beyond that
NUL, there might be some extra
space after the recoded data, extending to
the allocated buffer size.
- file
-
A
file is a sequence of bytes held outside
computer memory, but buffered through it.
For input, one argument provides a pointer
to a file already opened for read. The file
is then read and recoded from its current
position until the end of the file,
effectively swallowing it in memory if the
destination of the recoding is a buffer.
For reading a file filtered through the
recoding library, but only a little bit at
a time, one should rather use
recode_filter_open and
recode_filter_close (these two
functions are not yet available).
For output, one argument provides a
pointer to a file already opened for write.
The result of the recoding is written to
that file starting at its current
position.
The
following special function is still subject to
change:
void recode_format_table (request, language, "name");
and is not documented anymore for
now.
|