如何解决 Mac OS X 上 iconv 的奇怪问题

发布于 2024-07-18 11:34:33 字数 600 浏览 8 评论 0原文

我使用的是 Mac Os X 10.5(但我在 10.4 上重现了该问题)

我正在尝试使用 iconvUTF-8 文件转换为 ASCII

utf-8 文件包含像 'éàç' 这样的字符,

我希望将重音字符转换为最接近的 ascii 等效字符,

所以

我的命令是这样的:

iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt

在 Linux 机器上运行良好

但在我的本地 Mac Os X 上我得到这样的示例:

è =>; 'e

à => `a

我真的不明白为什么 iconv 在 mac os x 上返回这个奇怪的输出,但在 linux 上一切都很好,

有什么帮助吗? 或方向?

提前致谢

I am on Mac Os X 10.5 (but I reproduced the issue on 10.4)

I am trying to use iconv to convert an UTF-8 file to ASCII

the utf-8 file contains characters like 'éàç'

I want the accented characters to be turned into their closest ascii equivalent

so

my command is this :

iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt

which works fine on a Linux machine

but on my local Mac Os X I get this for instance :

è => 'e

à => `a

I really dont undersatnd why iconv returns this weird output on mac os x but all is fine on linux

any help ? or directions ?

thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

醉城メ夜风 2024-07-25 11:34:33

问题是 Mac OSX 使用另一个名为 libiconv 的 iconv 实现。 大多数 Linux 发行版都有 iconv 的实现,它是 libc 的一部分。 不幸的是,libiconv 将 ö、è 和 ñ 等字符音译为“o、`e 和 ~n”。解决此问题的唯一方法是下载源代码并修改 lib 目录中的 translit.h 文件。查找如下所示的行:

2,'“','o',

并将它们替换为这样的内容:

1,'o',

我花了几个小时在谷歌上试图找出这个问题的答案,最后决定下载源代码并破解它。 希望这对某人有帮助!

The problem is that Mac OSX uses another implementation of iconv called libiconv. Most Linux distributions have an implementation of iconv which is part of libc. Unfortunately libiconv transliterates characters such as ö, è and ñ as "o, `e and ~n. The only way to fix this is to download the source and modify the translit.h file in the lib directory. Find lines that look like this:

2, '"', 'o',

and replace them with something like this:

1, 'o',

I spent hours on google trying to figure out the answer to this problem and finally decided to download the source and hack around with it. Hope this helps someone!

无畏 2024-07-25 11:34:33

我找到了一个适合我的需求的解决方法(只是为了澄清:脚本获取一个字符串并将其转换为“永久链接”URL。

我的解决方法包括将 iconv 输出通过管道传输到 sed< /code> 过滤器:

echo á é ç this is a test | iconv -f utf8 -t ascii//TRANSLIT | sed 's/[^a-zA-Z 0-9]//g'

上述内容在 OS X Yosemite 中的结果是:

a e c this is a test

I found a workaround suitable for my needs (just to clarify: a script gets a string and converts it to a “permalink” URL.

My workaround consist on piping the iconv output to a sed filter:

echo á é ç this is a test | iconv -f utf8 -t ascii//TRANSLIT | sed 's/[^a-zA-Z 0-9]//g'

The result for the above in OS X Yosemite is:

a e c this is a test
一片旧的回忆 2024-07-25 11:34:33

我的猜测是,在你的 Linux 机器上,语言环境设置不同......
据我记得, iconv 使用当前的语言环境来翻译 UTF-X,并且默认情况下,macos 将语言环境设置为“C”,它(显然)不处理重音和语言特定字符...也许尝试这样做运行 iconv 之前:

setLocale( LC_ALL, "en_EN");

|K<

my guess is that on your linux machine the locale is set differently...
as far as I can remember, iconv uses the current locale to translate UTF-X, and by default the macos has the locale set to "C" which (obviously) does not handle accents and language specific characters... maybe try doing this before running iconv:

setLocale( LC_ALL, "en_EN");

|K<

瑕疵 2024-07-25 11:34:33

另一种选择是使用由 brew install unac 安装的 unaccent

$ unaccent utf-8<<<é
e

unaccent 不会转换分解形式的字符(例如 LATIN小写字母 E 后跟 COMBINING ACUTE ACCENT),但您可以使用 uconv 将字符转换为组合形式:

$ unaccent utf-8<<<

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin 安装 uconv

e\u0301' é $ uconv -f utf-8 -t utf-8 -x NFC<<<

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin 安装 uconv

e\u0301'|unaccent utf-8 e

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin 安装 uconv

Another option is to use unaccent which is installed by brew install unac:

$ unaccent utf-8<<<é
e

unaccent does not convert characters in decomposed form (such as LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT), but you can use uconv to convert characters to composed form:

$ unaccent utf-8<<<

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin installs uconv.

e\u0301' é $ uconv -f utf-8 -t utf-8 -x NFC<<<

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin installs uconv.

e\u0301'|unaccent utf-8 e

brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin installs uconv.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文