README file for Recode
Here is version 3.6 for the Recode program
and library. Hereafter, Recode means the
whole package, recode means
the executable program. Glance through this
README
file before starting configuration. Make sure
you read files ABOUT-NLS and INSTALL if you are not
familiar with them already.
The Recode library converts files between
character sets and usages. It recognises or
produces over 200 different character sets
(or about 300 if combined with an
iconv library) and
transliterates files between almost any pair.
When exact transliteration are not possible,
it gets rid of offending characters or falls
back on approximations. The
recode program is a handy
front-end to the library.
The Recode program and library have been
written by François Pinard, yet it
significantly reuses tabular works from Keld
Simonsen. It is an evolving package, and
specifications might change in future
releases.
On various Unix systems, Recode is usually
compiled from sources, see the Installation section
below. On Linux, it often comes bundled.
Recode had been ported to other popular
systems. See both contrib/README and the
Non-Unix ports section
below, to find some more information about
these.
Send bug reports to mailto:recode-bugs@iro.umontreal.ca'
. A bug report is an adequate description of
the problem: your input, what you expected,
what you got, and why this is wrong. Diffs
are welcome, but they only describe a
solution, from which the problem might be
uneasy to infer. If needed, submit actual
data files with your report. Small data files
are preferred. Big files may sometimes be
necessary, but do not send them on the
mailing list; rather take special arrangement
with the maintainer.
Your feedback will help us to make a
better and more portable package. Consider
documentation errors as bugs, and report them
as such. If you develop anything pertaining
to Recode or have suggestions, let us know
and share your findings by writing at
mailto:recode-forum@iro.umontreal.ca
. You may also choose to directly write at
mailto:pinard@iro.umontreal.ca,
yet be warned that such correspondence is
often visible for a while through the Recode
Web site.
If you feel like receiving releases and
pretest announcements for the Recode package,
send a message to mailto:majordomo@iro.umontreal.ca
having, in its body, a line saying:
subscribe recode-announce
If you rather want to participate actively
in discussions, pretesting and development
for Recode, do just as above, but this time,
use:
subscribe recode-forum
Visit http://recode.progiciels-bpi.ca/
for releases or pretests, and related files.
In particular, button Browse gives access to a
weekly mirror of the current unpackaged work
files, while button Folders gives access to
saved or pending correspondence.
Please do not widely redistribute
releases having a letter after the version
numbers, as these are meant for pretesting
only, and might not be stable enough for
other usages.
My plan has long been to end the 3.x
series of this package, rather aiming 4.0 as
a major internal rewrite. As there is still a
long way before 4.0 gets ready, and
especially because some of my good
collaborators insisted that I do so, there
will be a Recode 3.7. That release is meant
to provide a selection of user-contributed
patches.
For prototyping what Recode will become
and experimenting new concepts more easily, I
created a subsidiary and standalone project
named Recodec, meant to receive the best part
of my development efforts in this particular
area. Once I'll be happy with the prototype,
the plan is to rewrite it from Python to C,
somehow. Visit the Web pages for this
Recodec
project for more information and details.
For now at least, new features go to Recodec
only.
Here are a few notes related to the beta2
pre-test release for the incoming Recode 3.7.
I publish it to ease later exchanges of
patches with testers.
- The name has been changed from Free
recode to Recode -- as "Free" was a four
letter word to some people :-).
recode (no capital) still
names the executable program specifically,
or the distribution archive itself.
- Recode does not itself include
libiconv anymore. However,
it uses an external iconv
library if one is available at installation
time, like libiconv or the
one provided within GNU
libc. The -x: option to the
program, or a new flag to the library
recode_new_outer function,
inhibits the initialisation and usage of
iconv.
- The bug about loosing a few characters,
here and there, when recoding big files in
iconv context, seems to
have been corrected. A patch for this
problem has been floating around for years,
but it was not solving all cases.
- Recode installation now uses Python. In
particular, it creates file build/src/iconvdecl.h
from local iconv -l output. Recode testing
through make check also needs what
people python-devel,
providing C header files for Python and
distutils. The Makemore file has been
merged within regular Makefiles and is not
distributed separately anymore.
- It is likely that new bugs have been
introduced through the above changes. In
particular, not everything is cosy on the
side of release engineering. A few files
are either spuriously remade, or remade
late. I'm a bit surprised by the difficulty
to get this right.
- make check accepts a
LIMIT= option, for
limiting tests to one or a few cases. See
tests/Makefile for more
information.
- PO files have been updated from the
Translation Project.
The beta 1 pre-test release for the
incoming Recode 3.7 has been made available
for those needing it right away. While it
solves some serious bugs and portability
problems, others are meant to be addressed
only in later pre-tests. In particular, none
of charset or surface issues, user requests,
and various suggestions appear in this
pre-test, and will not either in later
pretests, until all real show-stoppers are
solved first. So this is in no way a
candidate for a Recode 3.7 release.
The test suite is worth more comments:
- The suite is very partial, and may not
be thought as a validation suite. Before it
could be used to ascertain confidence, it
would need much more tests than it has
already.
- Testing is notably more speedy than it
used to be. For example, the previous
bigauto test, which was
not run by default because it ran for too
long, is now executed within the standard
test suite, once in non-strict mode, and a
second time in strict mode.
- It does not use Autotest anymore, but
rather a home grown test driver much
inspired from the Codespeak project. The
link between the test and the Recode
library is established through a Pyrex
interface, so you need to have
python and
python-devel installed
first.
- Beware that the Pyrex interface to the
Recode library is only meant for testing.
for now at least. While you may play with
it, it would not be wise relying on it, as
the specifications might change at any
time.
Simple installation of Recode requires the
usual tools and facilities as those needed
for most GNU packages. If not already bundled
with your system, you also need to
pre-install Python, version 2.2 or better.
You may get it from:
http://www.python.org
It is also convenient to have some
iconv library already
present on your system, this much extends
Recode capabilities, especially in the area
of Asiatic character sets. GNU
libc, as found on Linux
systems and a few others, already has such an
iconv library. Otherwise,
you might consider pre-installing the
portable libiconv, written
by Bruno Haible. You may get it from:
http://www.gnu.org/software/libiconv/
Visit
http://github.com/pinard/Recode/tree/dev-3.7
and use the Download button to get a
packaged copy of development sources. If you
happen to be a Git lover, you may rather
use:
git clone git://github.com/pinard/Recode.git
and then, checkout branch dev-3.7. File timestamps
after checktou may trigger Make difficulties.
As a way to avoid, from the top level of the
distribution, execute sh after-patch.sh. If you miss
either sh or GNU
touch, try python after-patch.py instead.
Once you have an unpacked distribution,
see files:
| File name |
Description |
| ABOUT-NLS |
how to customise this program to
your language |
| COPYING |
copying conditions for the
program |
| COPYING.LIB |
copying conditions for the
library |
| INSTALL |
compilation and installation
instructions |
| NEWS |
major changes in the current
release |
| THANKS |
partial list of contributors |
Besides those configure options documented
in files INSTALL and ABOUT-NLS, a few extra
options may be accepted after ./configure:
-
Options --disable-shared or
--disable-static
to inhibit the building of shared
libraries or static libraries; the
default is to always build static
libraries, and to attempt building shared
libraries if there is some known recipe
for this.
-
Option --with-gnu-ld
to force the assumption that the C
compiler uses GNU ld.
-
Option --with-dmalloc
to trigger a debugging feature for
looking at memory management problems, it
pre-requires Gray Watson's package, which
is available as
ftp://ftp.letters.com/src/dmalloc/dmalloc.tar.gz
.
For simple modifications to Recode, you
should not need special tools beyond those
usual for installing GNU packages. However,
if you modify any .l source file, Python and
Flex are both needed for remaking merged.c.
For more comprehensive modifications, you
might need more tools. If not done already,
make sure you have a copy of the packages
listed in the following table. You may also
choose to establish a link in your build
doc/
directory, as explained within doc/Makemore.
| Package name |
Current |
Minimum |
Install after |
| autoconf |
2.61 |
2.12 |
m4 |
| automake |
1.10 |
1.9 |
Perl |
| Flex |
2.5.33 |
2.5.4a |
|
| gettext |
0.16 |
0.16 |
|
| Help2man |
1.36 |
1.020 |
Perl |
| libtool |
1.5.24 |
1.3.4 |
|
| m4 |
1.4.10 |
1.4n |
|
| Make |
3.81 |
|
|
| Perl |
5.8.8 |
5.005.03 |
|
| Python |
2.5.1 |
2.2 |
|
| tar |
1.17 |
1.12 |
|
| wget |
1.10.2 |
|
|
The current version numbers just
happen to be those used for development, it
is often likely that older versions would
work just as well. The minimum
version numbers were once acceptable, they
might not be anymore, this has not been
verified; any updating information is
welcome!
Here are a few hints which might help
installing Recode on some systems. Many may
be applied by temporary presetting
environment variables while calling
./configure. File
INSTALL explains this.
-
Compilation time
Some C compilers, like Apollo's, have
a hard time compiling merged.c. If this is
your case, avoid compiler optimisation.
From within the Bourne shell, you may
use:
CFLAGS= ./configure
But if you want to give a real hard
time to your C optimiser on merged.c, to get code
that runs only a bit faster, merely
try:
CPPFLAGS=-DINLINE_HARDER ./configure
-
Smallish systems
For 80286 based systems (do some still
exist?!), it has been reported that some
compilers generate wrong code while
optimising for small models. So,
from within the Bourne shell, do:
CFLAGS=-Ml LDFLAGS=-Ml ./configure
to force large memory model. For 80286
Xenix compiler, the last time it was
tried a while ago, one ought to use:
CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure
Other systems have poor
pipe/popen
support or thrash heavily when processes
fork. In this case, just before doing
make, edit config.h and ensure
HAVE_PIPE is
not defined.
-
IETF references
-
Various references
-
Unicode charset
mappings
The Unicode consortium makes
available plenty of charset mappings
for converting "legacy" charsets to
Unicode.
-
Normalisation et
internationalisation: Inventaire et
prospectives des normes clefs pour le
traitement informatique du français.
(392p.)
This is a report, written in
French, discussing charset issues and
many other topics as well. Laurent
Bourbeau <bourbeau@progiciels-bpi.ca>
and François Pinard <pinard@iro.umontreal.ca>,
1995-10.
-
Recode specific
-
ETL presentation
In 1999, the organisers of the
m17n99 conference in Tsukuba,
Japan, were kind enough to invite me.
This has been for me a fabulous trip
and experience, and I met many
extraordinary people in there. At the
conference, I presented the
Translation Project, and Recode. The
Recode presentation slides
are available.
-
libiconv
This comprehensive charset converter
library revolves around Unicode, and
support Asian encodings among many
others. Even Recode uses it!
Bruno Haible <haible@ilog.fr>
-
tcs
Here is the main recoding tool from
the Plan9 project.
-
yuedit
This GUI editor handles many
encodings, among which UTF-8. It also
installs uniconv, a recoding program, and
uniprint, a printing tool.
Gaspar Sinai <gsinai@iname.com>,
1999-01.
-
ucs-fonts
These 6x13 fonts, covering Unicode
characters besides the Asian sets,
merely replace the Linux fixed 6x13
font. Works nicely with yudit.
Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>,
1998-11.
-
MtRecode
This charset converter is oriented
towards SGML text manipulation. It may be
freely downloaded for non-commercial,
non-military use from:
Pointer given by Jean Véronis
<veronis@univ-aix.fr>,
1996-06.
-
sp
This quite nice SGML structure
analyser contains internal C++ modules
for handling many charsets.
James Clark <jjc@jclark.com>
-
b2c
This program is able to generate
interpreted character dumps, but properly
embedded within complete C header
files.
Jörg Heitkötter <Joerg.Heitkoetter@de.uu.net>,
1997-11.
-
PyRecode
This wrapper provides Recode
functionality to Python programs.
Andreas Jung <ajung@server.python.net>
Also see:
Please mailto:recode-bugs@iro.umontreal.ca
if you are aware of various ports to non-Unix
systems not listed here, or for corrections.
Please provide the goal system, a complete
and stable URL, the maintainer name and
address, the Recode version used as a base,
and your comments.
-
MSDOS (DJGPP)
Juan Manuel Guerrero <juan.guerrero@gmx.de>
maintains this port, dated 2001-03 and
based on Recode 3.5. The following
archives hold binaries, docs and sources
respectively.
See contrib/DJGPP/README in
the Recode distribution for more
information about compiling this
port.
-
MSDOS (Gnuish)
Darrel Hankerson <hankedr@mail.auburn.edu>
maintains this port, dated 1994-11 and
based on Recode 3.4. You get many GNU
tools, not only Recode. The GNUish
project is described in gnuish_t.htm.
-
OS/2 (using emx/gcc)
Maintainer unknown (maybe Kai Uwe
Rommel <rommel@ars.de>),
dated 1994-11 and based on Recode
3.4.
|
|
|