有没有关于 c 语言从语言到语言的 iconv 音译的简单示例?
假设我们有一个简单的场景,一个语言的字符串,比如法语。
我们希望将法语转换为音译形式的 ASCII。
如何用C语言以最简单的方式完成它?
还有一种完全不同的方式,与 iconv 无关,最好是多平台?
Say we have the simple scenario, a string of a language, say French.
And we want that French to be converted to ASCII in a transliterated form.
How can it be done in C in the simplest way?
Also is there's a completely different way, irrelevant to iconv, ideally multiplatform?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您想要多平台,
iconv
不是合适的工具。音译是 GNU 特定的扩展。一般来说,音译是一个难题,GNU iconv 实现仅足以解决简单的情况。非 ASCII 字符的音译方式不是该字符的属性,而是文本语言及其使用方式的属性。例如,“日”应该变成“ri”还是“ni”还是完全不同的东西?或者,如果您想坚持使用基于拉丁语的语言,“ö”应该变成“o”还是“oe”?扩展到其他非拉丁文字,音译大多数印度语言相当简单,但音译泰语需要对字符进行一些重新排序,而音译藏语则需要解析整个音节并识别哪些字母位于词根/前缀/后缀等中。角色。在我看来,“如何音译为 ASCII?”的最佳答案对于大多数软件程序来说是:不要。相反,修复任何让您首先需要 ASCII 的错误或故意以英语为中心的政策。唯一真正应该进行音译的软件是具有高度语言意识的软件,该软件有助于搜索或解释非用户母语的文本。
If you want multiplatform,
iconv
is not the right tool. Transliteration is a GNU-specific extension. In general, transliteration is a hard problem, and the GNUiconv
implementation is only sufficient for trivial cases. How a non-ASCII character gets transliterated is not a property of the character but of the language of the text and how it's being used. For instance, should "日" become "ri" or "ni" or something else entirely? Or if you want to stick with Latin-based languages, should "ö" become "o" or "oe"? Expanding to other non-Latin scripts, transliterating most Indic languages is fairly straightforward, but transliterating Thai requires some reordering of characters and transliterating Tibetan requires parsing whole syllables and identifying which letters are in root/prefix/suffix/etc. roles.In my opinion, the best answer to "How do I transliterate to ASCII?" for most software programs is: don't. Instead fix whatever bugs or intentionally-English-centric policies made you want ASCII in the first place. The only software that should really be doing transliteration is highly-linguistically-aware software facilitating search or interpretation of texts not in the user's own native language.