如何将日语字符分类为汉字、片假名或平假名?
我正在开发一些需要对日本语言进行排序的应用程序。
日语的排序需要将片假名和汉字转换为平假名,然后根据UTF-8编码进行排序。
平假名、片假名和汉字字符应组合在一起,并按平假名等效的“拼写”进行排序。注意:使用平假名“字母” - a、i、u、e、o、ka、ki、ku、ke、ki 等。
现在要执行此任务,我需要:
1.将日语字符分类为汉字或片假名或平假名。
2.将片假名和汉字转换为平假名。
3.应用基于语音(平假名)进行排序的算法。
应用程序的数据库采用 UTF-8 格式。
现在执行第一步: “将日语字符分类为汉字、片假名或平假名。” ,
我想知道 Sqlite3、QT、ICU 或任何其他可以提供字符 Unicode 的包中是否存在适用于 C 或 C++ 编程语言的 API?
在Unicode的基础上,我们可以很容易地对日语字符进行分类。
如果我错了请纠正我?
I am working on some application which require sorting of Japans languages.
Sorting of Japanese needs to have Katakana and Kanji converted to Hiragana and then sorted according to the UTF-8 code.
The Hiragana, Katakana, and Kanji characters shall be combined together and sorted by the Hiragana equivalent “spelling.” Note: using the Hiragana “alphabet” – a, i, u, e, o, ka, ki, ku, ke, ki, etc.
Now to do this task, I need :
1.Classify japanese characters as either kanji or Katakana or Hiragana.
2.Convert Katakana and Kanji to Hiragana .
3.Apply algorithm which carry out sorting base on phonetic sound(Hiragana).
The Database of application is in UTF-8 .
Now to carry out 1st step:
"Classify japanese characters as either kanji or Katakana or Hiragana." ,
I want to know if there is any APIs present for C or C++ programing language in Sqlite3 , QT , ICU or any other package which can give Unicode of Character ?
On the Base of Unicode, we can easily classify Japanese characters.
Please correct me if I am wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如您所说,可以使用 Unicode 轻松地将日语字符分组。这很简单。
片假名到平假名的转换也很简单,因为存在一对一的映射。您可以通过 Kakasi
将汉字转换为平假名 片假名
可以通过转换为平假名来进行排序第一的。然而,这是一个穷人的排序,因为许多汉字是同音字(相同的声音,不同的汉字)。因此,在转换和按平假名排序之前,应该先对汉字进行排序。
你没有说为什么需要以这种方式进行排序。如果您告诉我们有关您的申请的更多信息,也许我们可以建议更好的排序。
As you say, Japanese characters can easily be sorted into group using Unicode. This is trivial.
Conversion of katakana to hiragana is also trivial as there is a one to one mapping. You can convert kanji to hiragana via Kakasi
Sorting can be done by converting to hiragana first. However, this is a poor man's sort as many kanji are homophones (same sound, different kanji). So you should sort the Kanji before converting and sorting by hiragana.
You don't say why you need to do sorting in this way. Maybe there is a better sort we can suggest if you tell us more about your application.