什么 lucene 分析器可以用来处理日语文本?
哪种 lucene 分析器可用于正确处理日语文本?它应该能够处理汉字、平假名、片假名、罗马字及其任何组合。
Which lucene analyzer can be used to handle Japanese text properly? It should be able to handle Kanji, Hiragana, Katakana, Romaji, and any of their combination.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可能应该查看 CJK 包位于 Lucene 的 contrib 区域中。有一个分析器和一个分词器专门用于处理中文、日文和韩文。
You should probably look at the CJK package that is in the contrib area of Lucene. There is an analyzer and a tokenizer specifically for dealing with Chinese, Japanese, and Korean.
我在出于自己的目的进行搜索时发现了 lucene-gosen:
他们的示例看起来相当不错不错,但我想这是需要广泛测试的事情。我还担心他们的向后兼容性政策(或者更确切地说,完全缺乏向后兼容性政策。)
I found lucene-gosen while doing a search for my own purposes:
Their example looks fairly decent, but I guess it's the kind of thing that needs extensive testing. I'm also worried about their backwards-compatibility policy (or rather, the complete lack of one.)