ICU自定义音译
我希望利用 ICU 库进行音译,但我想为一组特定的自定义音译提供一个自定义音译文件,以便在编译时合并到 ICU 核心中,以便在其他地方以二进制形式使用。出于兼容性原因,我正在使用 ICU 4.2 的源代码。
据我了解,从其网站的 ICU 数据页面来看,解决此问题的一种方法是在 ICUHOME/source/data/translit/ 中创建文件 trnslocal.mk,并在该文件中包含单行 TRANSLIT_SOURCE_LOCAL=custom.txt
。
对于 custom.txt
文件本身,我基于主文件 root.txt
使用了以下格式:
custom{
RuleBasedTransliteratorIDs {
Kanji-Romaji {
file {
resource:process(transliterator){"custom/Kanji_Romaji.txt"}
direction{"FORWARD"}
}
}
}
TransliteratorNamePattern {
// Format for the display name of a Transliterator.
// This is the language-neutral form of this resource.
"{0,choice,0#|1#{1}|2#{1}-{2}}" // Display name
}
// Transliterator display names
// This is the English form of this resource.
"%Translit%Hex" { "%Translit%Hex" }
"%Translit%UnicodeName" { "%Translit%UnicodeName" }
"%Translit%UnicodeChar" { "%Translit%UnicodeChar" }
TransliterateLATIN{
"",
""
}
}
然后将其存储在目录 custom
中> 文件Kanji_Romaji.txt
,找到这里。因为它使用 >
而不是我在其他文件中看到的 →
,所以我适当地转换了每个条目,所以它们现在看起来像:
丁 → Tei ;
七 → Shichi ;
当我编译 ICU 项目时,我没有任何错误。
然而,当我尝试在测试文件中使用此自定义音译器(与内置音译器配合良好的测试文件)时,我遇到了错误 error: 65569:U_INVALID_ID
。
我使用以下代码来构造音译器并输出错误:
UErrorCode status = U_ZERO_ERROR;
Transliterator *K_R = Transliterator::createInstance("Kanji-Romaji", UTRANS_FORWARD, status);
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
return 0;
}
此外,循环到 Transliterator::countAvailableIDs()
和 Transliterator::getAvailableID(i)
确实如此不列出我的自定义音译。我记得读过有关自定义转换器的内容,它们必须在 /source/data/mappings/convrtrs.txt 中注册。有类似的音译器文件吗?
看来我的自定义音译器要么没有构建到适当的包中(尽管没有编译错误),要么格式不正确,要么以某种方式没有注册使用。顺便说一句,我知道运行时的 RuleBasedTransliterator 路由,但我希望能够编译自定义音译以在任何生成的二进制文件中使用。
如果需要任何额外说明,请告诉我。我知道这里至少有一位 ICU 程序员,他在我在其他地方写过和看到的其他帖子中也提供了很大的帮助。我将不胜感激我能找到的任何帮助。先感谢您!
I am looking to utilize the ICU library for transliteration, but I would like to provide a custom transliteration file for a set of specific custom transliterations, to be incorporated into the ICU core at compile time for use in binary form elsewhere. I am working with the source of ICU 4.2 for compatibility reasons.
As I understand it, from the ICU Data page of their website, one way of going about this is to create the file trnslocal.mk within ICUHOME/source/data/translit/ , and within this file have the single line TRANSLIT_SOURCE_LOCAL=custom.txt
.
For the custom.txt
file itself, I used the following format, based on the master file root.txt
:
custom{
RuleBasedTransliteratorIDs {
Kanji-Romaji {
file {
resource:process(transliterator){"custom/Kanji_Romaji.txt"}
direction{"FORWARD"}
}
}
}
TransliteratorNamePattern {
// Format for the display name of a Transliterator.
// This is the language-neutral form of this resource.
"{0,choice,0#|1#{1}|2#{1}-{2}}" // Display name
}
// Transliterator display names
// This is the English form of this resource.
"%Translit%Hex" { "%Translit%Hex" }
"%Translit%UnicodeName" { "%Translit%UnicodeName" }
"%Translit%UnicodeChar" { "%Translit%UnicodeChar" }
TransliterateLATIN{
"",
""
}
}
I then store within the directory custom
the file Kanji_Romaji.txt
, as found here. Because it uses >
instead of the →
I have seen in other files, I converted each entry appropriately, so they now look like:
丁 → Tei ;
七 → Shichi ;
When I compile the ICU project, I am presented with no errors.
When I attempt to utilize this custom transliterator within a testfile, however (a testfile that works fine with the in-built transliterators), I am met with the error error: 65569:U_INVALID_ID
.
I am using the following code to construct the transliterator and output the error:
UErrorCode status = U_ZERO_ERROR;
Transliterator *K_R = Transliterator::createInstance("Kanji-Romaji", UTRANS_FORWARD, status);
if (U_FAILURE(status))
{
std::cout << "error: " << status << ":" << u_errorName(status) << std::endl;
return 0;
}
Additionally, a loop through to Transliterator::countAvailableIDs()
and Transliterator::getAvailableID(i)
does not list my custom transliteration. I remember reading with regard to custom converters that they must be registered within /source/data/mappings/convrtrs.txt . Is there a similar file for transliterators?
It seems that my custom transliterator is either not being built into the appropriate packages (though there are no compile errors), is improperly formatted, or somehow not being registered for use. Incidentally, I am aware of the RuleBasedTransliterator route at runtime, but I would prefer to be able to compile the custom transliterations for use in any produced binary.
Let me know if any additional clarification is necessary. I know there is at least one ICU programmer on here, who has been quite helpful in other posts I have written and seen elsewhere as well. I would appreciate any help I can find. Thank you in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
音译器源自 CLDR - 您可以将音译器添加到 CLDR(crosswire 目录在 cldr 中以 XML 格式包含它) / 目录)并重建 ICU 数据。 ICU 没有像您尝试的那样添加音译器的简单机制。我要做的就是忘记 trnslocal.mk 或 custom.txt,因为您不需要添加任何文件,只需修改 root.txt - 如果您有建议的改进,您可能会提交错误。
Transliterators are sourced from CLDR - you could add your transliterator to CLDR (the crosswire directory contains it in XML format in the cldr/ directory) and rebuild ICU data. ICU doesn't have a simple mechanism for adding transliterators as you are trying to do. What I would do is forget about trnslocal.mk or custom.txt as you don't need to add any files, and simply modify root.txt - you might file a bug if you have a suggested improvement.