Free recode package

Next: , Previous: Outer level, Up: Library


4.2 Request level functions

The request level functions are meant to cover most recoding needs programmers may have; they should provide all usual functionality. Their API is almost stable by now.

To get started with request level functions, here is a full example of a program which sole job is to filter ibmpc code on its standard input into latin1 code on its standard output.

     #include <stdbool.h>
     #include <stdio.h>
     #include <stdlib.h>
     #include <recode.h>
     
     const char *program_name;
     
     int
     main (int argc, char *const *argv)
     {
       program_name = argv[0];
       RECODE_OUTER outer = recode_new_outer (true);
       RECODE_REQUEST request = recode_new_request (outer);
       bool success;
     
       recode_scan_request (request, "ibmpc..latin1");
     
       success = recode_file_to_file (request, stdin, stdout);
     
       recode_delete_request (request);
       recode_delete_outer (outer);
     
       exit (success ? EXIT_SUCCESS : EXIT_FAILURE);
     }

The header file recode.h declares a RECODE_REQUEST structure, which the programmer should use for allocating a variable in his program. This request variable is given as a first argument to all request level functions, and in most cases, may be considered as opaque.

Suppose an application is doing a lot of recoding using only a few different requests. For speed considerations, the RECODE_REQUEST structure should ideally be cached for each kind of request, so the request level initialisation is not redone for each and every string translated. The speedup should be more apparent when Recode is able to optimize the work by building on the fly, within the structure, new specialized recoding steps and their associated data tables.

  • Initialisation functions
              RECODE_REQUEST recode_new_request (outer);
              bool recode_delete_request (request);
    

    No request variable may not be used in other request level functions of the recoding library before having been initialised by recode_new_request. There may be many such request variables, in which case, they are independent of one another and they all need to be initialised separately. To avoid memory leaks, a request variable should not be initialised a second time without calling recode_delete_request to “un-initialise” it.

    Like for recode_delete_outer, calling recode_delete_request prior to program termination, in the example above, may be left out.

  • Fields of struct recode_request Here are the fields of a struct recode_request which may be meaningfully changed, once a request has been initialised by recode_new_request, but before it gets used. It is not very frequent, in practice, that these fields need to be changed. To access the fields, you need to include recodext.h instead of recode.h, in which case there also is a greater chance that you need to recompile your programs if a new version of the recoding library gets installed.
    verbose_flag
    This field is initially false. When set to true, the library will echo to stderr the sequence of elementary recoding steps needed to achieve the requested recoding.

    diaeresis_char
    This field is initially the ASCII value of a double quote ", but it may also be the ASCII value of a colon :. In texte charset, some countries use double quotes to mark diaeresis, while other countries prefer colons. This field contains the diaeresis character for the texte charset.

    make_header_flag
    This field is initially false. When set to true, it indicates that the program is merely trying to produce a recoding table in source form rather than completing any actual recoding. In such a case, the optimisation of step sequence can be attempted much more aggressively. If the step sequence cannot be reduced to a single step, table production will fail.

    diacritics_only
    This field is initially false. For HTML and LaTeX charset, it is often convenient to recode the diacriticized characters only, while just not recoding other HTML code using ampersands or angular brackets, or LaTeX code using backslashes. Set the field to true for getting this behaviour. In the other charset, one can edit text as well as HTML or LaTeX directives.

    ascii_graphics
    This field is initially false, and relate to characters 176 to 223 in the ibmpc charset, which are use to draw boxes. When set to true, while getting out of ibmpc, ASCII characters are selected so to graphically approximate these boxes.
  • Study of request strings
              bool recode_scan_request (request, "string");
    

    The main role of a request variable is to describe a set of recoding transformations. Function recode_scan_request studies the given string, and stores an internal representation of it into request. Note that string may be a full-fledged Recode request, possibly including surfaces specifications, intermediary charsets, sequences, aliases or abbreviations (see Requests).

    The internal representation automatically receives some pre-conditioning and optimisation, so the request may then later be used many times to achieve many actual recodings. It would not be efficient calling recode_scan_request many times with the same string, it is better having many request variables instead.

  • Actual recoding jobs

    Once the request variable holds the description of a recoding transformation, a few functions use it for achieving an actual recoding. Either input or output of a recoding may be string, an in-memory buffer, or a file.

    Functions with names like recode_input-type_to_output-type request an actual recoding, and are described below. It is easy to remember which arguments each function accepts, once grasped some simple principles for each possible type. However, one of the recoding function escapes these principles and is discussed separately, first.

              recode_string (request, string);
    

    The function recode_string recodes string according to request, and directly returns the resulting recoded string freshly allocated, or NULL if the recoding could not succeed for some reason. When this function is used, it is the responsibility of the programmer to ensure that the memory used by the returned string is later reclaimed.

              char *recode_string_to_buffer (request,
                input_string,
                &output_buffer, &output_length, &output_allocated);
              bool recode_string_to_file (request,
                input_file,
                output_file);
              bool recode_buffer_to_buffer (request,
                input_buffer, input_length,
                &output_buffer, &output_length, &output_allocated);
              bool recode_buffer_to_file (request,
                input_buffer, input_length,
                output_file);
              bool recode_file_to_buffer (request,
                input_file,
                &output_buffer, &output_length, &output_allocated);
              bool recode_file_to_file (request,
                input_file,
                output_file);
    

    All these functions return a bool result, false meaning that the recoding was not successful, often because of reversibility issues. The name of the function well indicates on which types it reads and which type it produces. Let's discuss these three types in turn.

    string
    A string is merely an in-memory buffer which is terminated by a NUL character (using as many bytes as needed), instead of being described by a byte length. For input, a pointer to the buffer is given through one argument.

    It is notable that there is no to_string functions. Only one function recodes into a string, and it is recode_string, which has already been discussed separately, above.

    buffer
    A buffer is a sequence of bytes held in computer memory. For input, two arguments provide a pointer to the start of the buffer and its byte size. Note that for charsets using many bytes per character, the size is given in bytes, not in characters.

    For output, three arguments provide the address of three variables, which will receive the buffer pointer, the used buffer size in bytes, and the allocated buffer size in bytes. If at the time of the call, the buffer pointer is NULL, then the allocated buffer size should also be zero, and the buffer will be allocated afresh by the recoding functions. However, if the buffer pointer is not NULL, it should be already allocated, the allocated buffer size then gives its size. If the allocated size gets exceeded while the recoding goes, the buffer will be automatically reallocated bigger, probably elsewhere, and the allocated buffer size will be adjusted accordingly.

    The second variable, giving the in-memory buffer size, will receive the exact byte size which was needed for the recoding. A NUL character is guaranteed at the end of the produced buffer, but is not counted in the byte size of the recoding. Beyond that NUL, there might be some extra space after the recoded data, extending to the allocated buffer size.

    file
    A file is a sequence of bytes held outside computer memory, but buffered through it. For input, one argument provides a pointer to a file already opened for read. The file is then read and recoded from its current position until the end of the file, effectively swallowing it in memory if the destination of the recoding is a buffer. For reading a file filtered through the recoding library, but only a little bit at a time, one should rather use recode_filter_open and recode_filter_close (these two functions are not yet available).

    For output, one argument provides a pointer to a file already opened for write. The result of the recoding is written to that file starting at its current position.

The following special function is still subject to change:

     void recode_format_table (request, language, "name");

and is not documented anymore for now.