README file for Recode
Here is version 3.6 for the Recode program
and library. Hereafter, Recode means the
whole package, recode means the executable
program. Glance through this README file
before starting configuration. Make sure you
read files ABOUT-NLS and INSTALL if you
are not familiar with them already.
The Recode library converts files between
character sets and usages. It recognises or
produces over 200 different character sets
(or about 300 if combined with an
iconv library)
and transliterates files between almost any
pair. When exact transliteration are not
possible, it gets rid of offending characters
or falls back on approximations. The
recode program
is a handy front-end to the library.
The Recode program and library have been
written by François Pinard, yet it
significantly reuses tabular works from Keld
Simonsen. It is an evolving package, and
specifications might change in future
releases.
On various Unix systems, Recode is usually
compiled from sources, see the Installation section
below. On Linux, it often comes bundled.
Recode had been ported to other popular
systems. See both contrib/README and the
Non-Unix ports section
below, to find some more information about
these.
Send bug reports to mailto:recode-bugs@iro.umontreal.ca'
. A bug report is an adequate description of
the problem: your input, what you expected,
what you got, and why this is wrong. Diffs
are welcome, but they only describe a
solution, from which the problem might be
uneasy to infer. If needed, submit actual
data files with your report. Small data files
are preferred. Big files may sometimes be
necessary, but do not send them on the
mailing list; rather take special arrangement
with the maintainer.
Your feedback will help us to make a
better and more portable package. Consider
documentation errors as bugs, and report them
as such. If you develop anything pertaining
to Recode or have suggestions, let us know
and share your findings by writing at
mailto:recode-forum@iro.umontreal.ca
. You may also choose to directly write at
mailto:pinard@iro.umontreal.ca,
yet be warned that such correspondence is
often visible for a while through the Recode
Web site.
If you feel like receiving releases and
pretest announcements for the Recode package,
send a message to mailto:majordomo@iro.umontreal.ca
having, in its body, a line saying:
subscribe recode-announce
If you rather want to participate actively
in discussions, pretesting and development
for Recode, do just as above, but this time,
use:
subscribe recode-forum
Visit http://recode.progiciels-bpi.ca/
for releases or pretests, and related files.
In particular, button Browse gives access
to a weekly mirror of the current unpackaged
work files, while button Folders gives access
to saved or pending correspondence.
Please do not widely redistribute
releases having a letter after the version
numbers, as these are meant for pretesting
only, and might not be stable enough for
other usages.
My plan has long been to end the 3.x
series of this package, rather aiming 4.0 as
a major internal rewrite. As there is still a
long way before 4.0 gets ready, and
especially because some of my good
collaborators insisted that I do so, there
will be a Recode 3.7. That release is meant
to provide a selection of user-contributed
patches.
For prototyping what Recode will become
and experimenting new concepts more easily, I
created a subsidiary and standalone project
named Recodec, meant to receive the best part
of my development efforts in this particular
area. Once I'll be happy with the prototype,
the plan is to rewrite it from Python to C,
somehow. Visit the Web pages for this
Recodec
project for more information and details.
For now at least, new features go to Recodec
only.
Here are a few notes related to the beta2
pre-test release for the incoming Recode 3.7.
I publish it to ease later exchanges of
patches with testers.
- The name has been changed from Free
recode to Recode -- as "Free" was a four
letter word to some people :-).
recode (no
capital) still names the executable program
specifically, or the distribution archive
itself.
- Recode does not itself include
libiconv
anymore. However, it uses an external
iconv library
if one is available at installation time,
like libiconv
or the one provided within GNU
libc. The
-x: option to the
program, or a new flag to the library
recode_new_outer function,
inhibits the initialisation and usage of
iconv.
- The bug about loosing a few characters,
here and there, when recoding big files in
iconv
context, seems to have been corrected. A
patch for this problem has been floating
around for years, but it was not solving
all cases.
- Recode installation now uses Python. In
particular, it creates file build/src/iconvdecl.h
from local iconv -l output. Recode testing
through make
check also needs what people
python-devel,
providing C header files for Python and
distutils.
The Makemore file
has been merged within regular Makefiles
and is not distributed separately
anymore.
- It is likely that new bugs have been
introduced through the above changes. In
particular, not everything is cosy on the
side of release engineering. A few files
are either spuriously remade, or remade
late. I'm a bit surprised by the difficulty
to get this right.
- make
check accepts a LIMIT= option, for
limiting tests to one or a few cases. See
tests/Makefile
for more information.
- PO files have been updated from the
Translation Project.
The beta 1 pre-test release for the
incoming Recode 3.7 has been made available
for those needing it right away. While it
solves some serious bugs and portability
problems, others are meant to be addressed
only in later pre-tests. In particular, none
of charset or surface issues, user requests,
and various suggestions appear in this
pre-test, and will not either in later
pretests, until all real show-stoppers are
solved first. So this is in no way a
candidate for a Recode 3.7 release.
The test suite is worth more comments:
- The suite is very partial, and may not
be thought as a validation suite. Before it
could be used to ascertain confidence, it
would need much more tests than it has
already.
- Testing is notably more speedy than it
used to be. For example, the previous
bigauto test,
which was not run by default because it ran
for too long, is now executed within the
standard test suite, once in non-strict
mode, and a second time in strict
mode.
- It does not use Autotest anymore, but
rather a home grown test driver much
inspired from the Codespeak project. The
link between the test and the Recode
library is established through a Pyrex
interface, so you need to have
python and
python-devel
installed first.
- Beware that the Pyrex interface to the
Recode library is only meant for testing,
for now at least. While you may play with
it, it would not be wise relying on it, as
the specifications might change at any
time.
Simple installation of Recode requires the
usual tools and facilities as those needed
for most GNU packages. If not already bundled
with your system, you also need to
pre-install Python, version 2.2 or better.
You may get it from:
http://www.python.org
It is also convenient to have some
iconv library
already present on your system, this much
extends Recode capabilities, especially in
the area of Asiatic character sets. GNU
libc, as found
on Linux systems and a few others, already
has such an iconv library. Otherwise, you
might consider pre-installing the portable
libiconv,
written by Bruno Haible. You may get it
from:
http://www.gnu.org/software/libiconv/
Source files and various distributions
(either latest, prestest, or archive) are
available through:
https://github.com/pinard/Recode/
File timestamps after checkout may trigger
Make difficulties. As a way to avoid these,
from the top level of the distribution,
execute sh
after-patch.sh
before configuring. If you miss either
sh or GNU
touch, try
python
after-patch.py
instead.
For simple modifications to Recode, you
should not need special tools beyond those
usual for installing GNU packages. However,
if you modify any .l source file,
Python and Flex are both needed for remaking
merged.c.
For more comprehensive modifications, you
might need more tools. If not done already,
make sure you have a copy of the packages
listed in the following table. You may also
choose to establish a link in your build
doc/
directory, as explained within doc/Makemore.
| Package name |
Current |
Minimum |
Install after |
| autoconf |
2.61 |
2.12 |
m4 |
| automake |
1.10 |
1.9 |
Perl |
| Flex |
2.5.33 |
2.5.4a |
|
| gettext |
0.16 |
0.16 |
|
| Help2man |
1.36 |
1.020 |
Perl |
| libtool |
1.5.24 |
1.3.4 |
|
| m4 |
1.4.10 |
1.4n |
|
| Make |
3.81 |
|
|
| Perl |
5.8.8 |
5.005.03 |
|
| Python |
2.5.1 |
2.2 |
|
| tar |
1.17 |
1.12 |
|
| wget |
1.10.2 |
|
|
The current version numbers just
happen to be those used for development, it
is often likely that older versions would
work just as well. The minimum
version numbers were once acceptable, they
might not be anymore, this has not been
verified; any updating information is
welcome!
Here are a few hints which might help
installing Recode on some systems. Many may
be applied by temporary presetting
environment variables while calling
./configure.
File INSTALL explains
this.
-
Compilation time
Some C compilers, like Apollo's, have
a hard time compiling merged.c. If
this is your case, avoid compiler
optimisation. From within the Bourne
shell, you may use:
CFLAGS= ./configure
But if you want to give a real hard
time to your C optimiser on merged.c, to
get code that runs only a bit faster,
merely try:
CPPFLAGS=-DINLINE_HARDER ./configure
-
Smallish systems
For 80286 based systems (do some still
exist?!), it has been reported that some
compilers generate wrong code while
optimising for small models. So,
from within the Bourne shell, do:
CFLAGS=-Ml LDFLAGS=-Ml ./configure
to force large memory model. For 80286
Xenix compiler, the last time it was
tried a while ago, one ought to use:
CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure
Other systems have poor pipe/popen support or thrash
heavily when processes fork. In this
case, just before doing make, edit
config.h and
ensure HAVE_PIPE is not
defined.
-
IETF references
-
Various references
-
Unicode charset
mappings
The Unicode consortium makes
available plenty of charset mappings
for converting "legacy" charsets to
Unicode.
-
Normalisation et
internationalisation: Inventaire et
prospectives des normes clefs pour le
traitement informatique du français.
(392p.)
This is a report, written in
French, discussing charset issues and
many other topics as well. Laurent
Bourbeau <bourbeau@progiciels-bpi.ca>
and François Pinard <pinard@iro.umontreal.ca>,
1995-10.
-
Recode specific
-
ETL presentation
In 1999, the organisers of the
m17n99 conference in Tsukuba,
Japan, were kind enough to invite me.
This has been for me a fabulous trip
and experience, and I met many
extraordinary people in there. At the
conference, I presented the
Translation Project, and Recode. The
Recode presentation
slides are available.
-
libiconv
This comprehensive charset converter
library revolves around Unicode, and
support Asian encodings among many
others. Even Recode uses it!
Bruno Haible <haible@ilog.fr>
-
tcs
Here is the main recoding tool from
the Plan9 project.
-
yuedit
This GUI editor handles many
encodings, among which UTF-8. It also
installs uniconv, a recoding program, and
uniprint, a printing tool.
Gaspar Sinai <gsinai@iname.com>,
1999-01.
-
ucs-fonts
These 6x13 fonts, covering Unicode
characters besides the Asian sets,
merely replace the Linux fixed 6x13
font. Works nicely with yudit.
Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>,
1998-11.
-
MtRecode
This charset converter is oriented
towards SGML text manipulation. It may be
freely downloaded for non-commercial,
non-military use from:
Pointer given by Jean Véronis
<veronis@univ-aix.fr>,
1996-06.
-
sp
This quite nice SGML structure
analyser contains internal C++ modules
for handling many charsets.
James Clark <jjc@jclark.com>
-
b2c
This program is able to generate
interpreted character dumps, but properly
embedded within complete C header
files.
Jörg Heitkötter <Joerg.Heitkoetter@de.uu.net>,
1997-11.
-
PyRecode
This wrapper provides Recode
functionality to Python programs.
Andreas Jung <ajung@server.python.net>
Also see:
Please mailto:recode-bugs@iro.umontreal.ca
if you are aware of various ports to non-Unix
systems not listed here, or for corrections.
Please provide the goal system, a complete
and stable URL, the maintainer name and
address, the Recode version used as a base,
and your comments.
-
MSDOS (DJGPP)
Juan Manuel Guerrero <juan.guerrero@gmx.de>
maintains this port, dated 2001-03 and
based on Recode 3.5. The following
archives hold binaries, docs and sources
respectively.
See contrib/DJGPP/README
in the Recode distribution for more
information about compiling this
port.
-
MSDOS (Gnuish)
Darrel Hankerson <hankedr@mail.auburn.edu>
maintains this port, dated 1994-11 and
based on Recode 3.4. You get many GNU
tools, not only Recode. The GNUish
project is described in gnuish_t.htm.
-
OS/2 (using emx/gcc)
Maintainer unknown (maybe Kai Uwe
Rommel <rommel@ars.de>),
dated 1994-11 and based on Recode
3.4.
|
|
|