如何解决 Mac OS X 上 iconv 的奇怪问题
我使用的是 Mac Os X 10.5(但我在 10.4 上重现了该问题)
我正在尝试使用 iconv 将 UTF-8 文件转换为 ASCII
utf-8 文件包含像 'éàç' 这样的字符,
我希望将重音字符转换为最接近的 ascii 等效字符,
所以
我的命令是这样的:
iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt
在 Linux 机器上运行良好,
但在我的本地 Mac Os X 上我得到这样的示例:
è =>; 'e
à => `a
我真的不明白为什么 iconv 在 mac os x 上返回这个奇怪的输出,但在 linux 上一切都很好,
有什么帮助吗? 或方向?
提前致谢
I am on Mac Os X 10.5 (but I reproduced the issue on 10.4)
I am trying to use iconv to convert an UTF-8 file to ASCII
the utf-8 file contains characters like 'éàç'
I want the accented characters to be turned into their closest ascii equivalent
so
my command is this :
iconv -f UTF-8 -t ASCII//TRANSLIT//IGNORE myutf8file.txt
which works fine on a Linux machine
but on my local Mac Os X I get this for instance :
è => 'e
à => `a
I really dont undersatnd why iconv returns this weird output on mac os x but all is fine on linux
any help ? or directions ?
thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
问题是 Mac OSX 使用另一个名为 libiconv 的 iconv 实现。 大多数 Linux 发行版都有 iconv 的实现,它是 libc 的一部分。 不幸的是,libiconv 将 ö、è 和 ñ 等字符音译为“o、`e 和 ~n”。解决此问题的唯一方法是下载源代码并修改 lib 目录中的 translit.h 文件。查找如下所示的行:
2,'“','o',
并将它们替换为这样的内容:
1,'o',
我花了几个小时在谷歌上试图找出这个问题的答案,最后决定下载源代码并破解它。 希望这对某人有帮助!
The problem is that Mac OSX uses another implementation of iconv called libiconv. Most Linux distributions have an implementation of iconv which is part of libc. Unfortunately libiconv transliterates characters such as ö, è and ñ as "o, `e and ~n. The only way to fix this is to download the source and modify the translit.h file in the lib directory. Find lines that look like this:
2, '"', 'o',
and replace them with something like this:
1, 'o',
I spent hours on google trying to figure out the answer to this problem and finally decided to download the source and hack around with it. Hope this helps someone!
我找到了一个适合我的需求的解决方法(只是为了澄清:脚本获取一个字符串并将其转换为“永久链接”URL。
我的解决方法包括将
iconv
输出通过管道传输到sed< /code> 过滤器:
上述内容在 OS X Yosemite 中的结果是:
I found a workaround suitable for my needs (just to clarify: a script gets a string and converts it to a “permalink” URL.
My workaround consist on piping the
iconv
output to ased
filter:The result for the above in OS X Yosemite is:
我的猜测是,在你的 Linux 机器上,语言环境设置不同......
据我记得, iconv 使用当前的语言环境来翻译 UTF-X,并且默认情况下,macos 将语言环境设置为“C”,它(显然)不处理重音和语言特定字符...也许尝试这样做运行 iconv 之前:
|K<
my guess is that on your linux machine the locale is set differently...
as far as I can remember, iconv uses the current locale to translate UTF-X, and by default the macos has the locale set to "C" which (obviously) does not handle accents and language specific characters... maybe try doing this before running iconv:
|K<
另一种选择是使用由
brew install unac
安装的unaccent
:unaccent
不会转换分解形式的字符(例如LATIN小写字母 E
后跟COMBINING ACUTE ACCENT
),但您可以使用uconv
将字符转换为组合形式:brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin
安装uconv
。Another option is to use
unaccent
which is installed bybrew install unac
:unaccent
does not convert characters in decomposed form (such asLATIN SMALL LETTER E
followed byCOMBINING ACUTE ACCENT
), but you can useuconv
to convert characters to composed form:brew install icu4c;ln -s /usr/local/opt/icu4c/bin/uconv /usr/local/bin
installsuconv
.