README file for Recode
Here is version 3.6 for the Recode program and library.
Hereafter, Recode means the whole package, recode means the executable program. Glance through
this README file before
starting configuration. Make sure you read files ABOUT-NLS and
INSTALL if you are not
familiar with them already.
The Recode library converts files between character sets and
usages. It recognises or produces over 200 different character sets
(or about 300 if combined with an iconv library) and transliterates files between
almost any pair. When exact transliteration are not possible, it
gets rid of offending characters or falls back on approximations.
The recode program is a handy
front-end to the library.
The Recode program and library have been written by François
Pinard, yet it significantly reuses tabular works from Keld
Simonsen. It is an evolving package, and specifications might
change in future releases.
On various Unix systems, Recode is usually compiled from
sources, see the Installation section below. On Linux, it often
comes bundled. Recode had been ported to other popular systems. See
both contrib/README and the Non-Unix ports
section below, to find some more information about these.
Send bug reports to mailto:recode-bugs@iro.umontreal.ca'
. A bug report is an adequate description of the problem: your
input, what you expected, what you got, and why this is wrong.
Diffs are welcome, but they only describe a solution, from which
the problem might be uneasy to infer. If needed, submit actual data
files with your report. Small data files are preferred. Big files
may sometimes be necessary, but do not send them on the mailing
list; rather take special arrangement with the maintainer.
Your feedback will help us to make a better and more portable
package. Consider documentation errors as bugs, and report them as
such. If you develop anything pertaining to Recode or have
suggestions, let us know and share your findings by writing at
mailto:recode-forum@iro.umontreal.ca
. You may also choose to directly write at mailto:pinard@iro.umontreal.ca,
yet be warned that such correspondence is often visible for a while
through the Recode Web site.
If you feel like receiving releases and pretest announcements
for the Recode package, send a message to mailto:majordomo@iro.umontreal.ca
having, in its body, a line saying:
subscribe recode-announce
If you rather want to participate actively in discussions,
pretesting and development for Recode, do just as above, but this
time, use:
subscribe recode-forum
Visit http://recode:8000/
for releases or pretests, and related files. In particular, button
Browse gives access to a weekly
mirror of the current unpackaged work files, while button
Folders gives access to saved or
pending correspondence.
Please do not widely redistribute releases having a
letter after the version numbers, as these are meant for pretesting
only, and might not be stable enough for other usages.
My plan has long been to end the 3.x series of this package,
rather aiming 4.0 as a major internal rewrite. As there is still a
long way before 4.0 gets ready, and especially because
some of my good collaborators insisted that I do so, there will be
a Recode 3.7. That release is meant to provide a selection of
user-contributed patches.
For prototyping what Recode will become and experimenting new
concepts more easily, I created a subsidiary and standalone project
named Recodec, meant to receive the best part of my development
efforts in this particular area. Once I'll be happy with the
prototype, the plan is to rewrite it from Python to C, somehow.
Visit the Web pages for this Recodec project for more
information and details. For now at least, new features go to
Recodec only.
Here are a few notes related to the beta2 pre-test release for
the incoming Recode 3.7. I publish it to ease later exchanges of
patches with testers.
- The name has been changed from Free recode to Recode -- as
"Free" was a four letter word to some people :-). recode (no capital) still names the executable
program specifically, or the distribution archive itself.
- Recode does not itself include libiconv anymore. However, it uses an external
iconv library if one is available at
installation time, like libiconv or
the one provided within GNU libc. The
-x:
option to the program, or a new flag to the library recode_new_outer function, inhibits the
initialisation and usage of iconv.
- The bug about loosing a few characters, here and there, when
recoding big files in iconv context,
seems to have been corrected. A patch for this problem has been
floating around for years, but it was not solving all cases.
- Recode installation now uses Python. In particular, it creates
file build/src/iconvdecl.h
from local iconv -l output. Recode testing through make check also needs what people
python-devel, providing C header
files for Python and distutils. The
Makemore file has been
merged within regular Makefiles and is not distributed separately
anymore.
- It is likely that new bugs have been introduced through the
above changes. In particular, not everything is cosy on the side of
release engineering. A few files are either spuriously remade, or
remade late. I'm a bit surprised by the difficulty to get this
right.
- make check accepts a
LIMIT= option, for limiting tests
to one or a few cases. See tests/Makefile for more
information.
- PO files have been updated from the Translation Project.
The beta 1 pre-test release for the incoming Recode 3.7 has been
made available for those needing it right away. While it solves
some serious bugs and portability problems, others are meant to be
addressed only in later pre-tests. In particular, none of charset
or surface issues, user requests, and various suggestions appear in
this pre-test, and will not either in later pretests, until all
real show-stoppers are solved first. So this is in no way a
candidate for a Recode 3.7 release.
The test suite is worth more comments:
- The suite is very partial, and may not be thought as a
validation suite. Before it could be used to ascertain confidence,
it would need much more tests than it has already.
- Testing is notably more speedy than it used to be. For example,
the previous bigauto test, which was
not run by default because it ran for too long, is now executed
within the standard test suite, once in non-strict mode, and a
second time in strict mode.
- It does not use Autotest anymore, but rather a home grown test
driver much inspired from the Codespeak project. The link between
the test and the Recode library is established through a Pyrex
interface, so you need to have python
and python-devel installed
first.
- Beware that the Pyrex interface to the Recode library is only
meant for testing, for now at least. While you may play with it, it
would not be wise relying on it, as the specifications might change
at any time.
Simple installation of Recode requires the usual tools and
facilities as those needed for most GNU packages. If not already
bundled with your system, you also need to pre-install Python,
version 2.2 or better. You may get it from:
http://www.python.org
It is also convenient to have some iconv library already present on your system, this
much extends Recode capabilities, especially in the area of Asiatic
character sets. GNU libc, as found on
Linux systems and a few others, already has such an iconv library. Otherwise, you might consider
pre-installing the portable libiconv,
written by Bruno Haible. You may get it from:
http://www.gnu.org/software/libiconv/
Source files and various distributions (either latest, prestest,
or archive) are available through:
https://github.com/pinard/Recode/
File timestamps after checkout may trigger Make difficulties. As
a way to avoid these, from the top level of the distribution,
execute sh after-patch.sh before configuring. If you miss
either sh or GNU touch, try python
after-patch.py instead.
For simple modifications to Recode, you should not need special
tools beyond those usual for installing GNU packages. However, if
you modify any .l source
file, Python and Flex are both needed for remaking merged.c.
For more comprehensive modifications, you might need more tools.
If not done already, make sure you have a copy of the packages
listed in the following table. You may also choose to establish a
link in your build doc/
directory, as explained within doc/Makemore.
| Package name |
Current |
Minimum |
Install after |
| autoconf |
2.61 |
2.12 |
m4 |
| automake |
1.10 |
1.9 |
Perl |
| Flex |
2.5.33 |
2.5.4a |
|
| gettext |
0.16 |
0.16 |
|
| Help2man |
1.36 |
1.020 |
Perl |
| libtool |
1.5.24 |
1.3.4 |
|
| m4 |
1.4.10 |
1.4n |
|
| Make |
3.81 |
|
|
| Perl |
5.8.8 |
5.005.03 |
|
| Python |
2.5.1 |
2.2 |
|
| tar |
1.17 |
1.12 |
|
| wget |
1.10.2 |
|
|
The current version numbers just happen to be those
used for development, it is often likely that older versions would
work just as well. The minimum version numbers were once
acceptable, they might not be anymore, this has not been verified;
any updating information is welcome!
Here are a few hints which might help installing Recode on some
systems. Many may be applied by temporary presetting environment
variables while calling ./configure. File INSTALL explains this.
-
Compilation time
Some C compilers, like Apollo's, have a hard time compiling
merged.c. If this is your
case, avoid compiler optimisation. From within the Bourne shell,
you may use:
CFLAGS= ./configure
But if you want to give a real hard time to your C optimiser on
merged.c, to get code that
runs only a bit faster, merely try:
CPPFLAGS=-DINLINE_HARDER ./configure
-
Smallish systems
For 80286 based systems (do some still exist?!), it has been
reported that some compilers generate wrong code while optimising
for small models. So, from within the Bourne shell,
do:
CFLAGS=-Ml LDFLAGS=-Ml ./configure
to force large memory model. For 80286 Xenix compiler, the last
time it was tried a while ago, one ought to use:
CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure
Other systems have poor pipe/popen support or
thrash heavily when processes fork. In this case, just before doing
make, edit config.h and ensure HAVE_PIPE is not defined.
-
IETF references
-
Various references
-
Unicode charset mappings
The Unicode consortium makes available plenty of charset
mappings for converting "legacy" charsets to Unicode.
-
Normalisation et internationalisation: Inventaire
et prospectives des normes clefs pour le traitement informatique du
français. (392p.)
This is a report, written in French, discussing charset issues
and many other topics as well. Laurent Bourbeau <bourbeau@progiciels-bpi.ca>
and François Pinard <pinard@iro.umontreal.ca>,
1995-10.
-
Recode specific
-
ETL presentation
In 1999, the organisers of the
m17n99 conference in Tsukuba, Japan, were kind enough to invite
me. This has been for me a fabulous trip and experience, and I met
many extraordinary people in there. At the conference, I presented
the Translation Project, and Recode. The Recode presentation slides are
available.
-
libiconv
This comprehensive charset converter library revolves around
Unicode, and support Asian encodings among many others. Even Recode
uses it!
Bruno Haible <haible@ilog.fr>
-
tcs
Here is the main recoding tool from the Plan9 project.
-
yuedit
This GUI editor handles many encodings, among which UTF-8. It
also installs uniconv, a recoding program, and uniprint, a printing
tool.
Gaspar Sinai <gsinai@iname.com>, 1999-01.
-
ucs-fonts
These 6x13 fonts, covering Unicode characters besides the Asian
sets, merely replace the Linux fixed 6x13 font. Works nicely with
yudit.
Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>,
1998-11.
-
MtRecode
This charset converter is oriented towards SGML text
manipulation. It may be freely downloaded for non-commercial,
non-military use from:
Pointer given by Jean Véronis <veronis@univ-aix.fr>,
1996-06.
-
sp
This quite nice SGML structure analyser contains internal C++
modules for handling many charsets.
James Clark <jjc@jclark.com>
-
b2c
This program is able to generate interpreted character dumps,
but properly embedded within complete C header files.
Jörg Heitkötter <Joerg.Heitkoetter@de.uu.net>,
1997-11.
-
PyRecode
This wrapper provides Recode functionality to Python
programs.
Andreas Jung <ajung@server.python.net>
Also see:
Please mailto:recode-bugs@iro.umontreal.ca
if you are aware of various ports to non-Unix systems not listed
here, or for corrections. Please provide the goal system, a
complete and stable URL, the maintainer name and address, the
Recode version used as a base, and your comments.
-
MSDOS (DJGPP)
Juan Manuel Guerrero <juan.guerrero@gmx.de>
maintains this port, dated 2001-03 and based on Recode 3.5. The
following archives hold binaries, docs and sources
respectively.
See contrib/DJGPP/README in the Recode distribution
for more information about compiling this port.
-
MSDOS (Gnuish)
Darrel Hankerson <hankedr@mail.auburn.edu>
maintains this port, dated 1994-11 and based on Recode 3.4. You get
many GNU tools, not only Recode. The GNUish project is described in
gnuish_t.htm.
-
OS/2 (using emx/gcc)
Maintainer unknown (maybe Kai Uwe Rommel <rommel@ars.de>), dated 1994-11 and
based on Recode 3.4.
|