带有中文的 Google AJAX Language API

发布于 2024-08-16 22:24:33 字数 158 浏览 12 评论 0 原文

有谁知道是否支持中文拼音?我在此处获得了带有正确中文拼音的结果(请参阅“显示罗马化”链接)。

谢谢。

Does anyone know if there is support for Chinese pinyin? I get the results here with correct Chinese pinyin (see "Show romanization" link).

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你好,陌生人 2024-08-23 22:24:33

我不知道 Google AJAX 语言 API 是否支持转换为拼音,但如果不支持,那么在您的 on 上进行尚可的转换实际上并不难。 (从拼音到汉字(字符)的反向转换更加更加棘手,因为拼音的损耗很大。)

要自己进行转换,请获取 Unihan.zipUnihan 数据库。您真正关心的文件是 Unihan_Readings.txt。它还包含一堆你不关心的东西,而且它的存储方式也相当低效,所以不要太担心大文件大小。您应该提取您关心的内容并以更有效的方式存储它。

在其中您将找到如下所示的制表符分隔行:

U+597D  kCantonese      hou2 hou3
U+597D  kDefinition     good, excellent, fine; well
U+597D  kHangul         호
U+597D  kHanyuPinlu     hao3(6060) hao1(142) hao4(115)
U+597D  kHanyuPinyin    21028.010:hǎo,hào
U+597D  kJapaneseKun    KONOMU SUKU YOI
U+597D  kJapaneseOn     KOU
U+597D  kKorean         HO
U+597D  kMandarin       HAO3 HAO4
U+597D  kTang           *xɑ̀u *xɑ̌u
U+597D  kVietnamese     háo
U+597D  kXHC1983        0445.030:hǎo 0448.030:hào

左列(“U+597D”)是 unicode 代码点,中间列是属性名称,右列是属性值。您可以提取 kHanyuPinyin 属性或 kMandarin 属性。它们编码的信息基本相同——只需使用您更容易处理的格式即可。 (hǎo == HAO3,hào == HAO4,如果这不明显)

您会注意到,对于某些字符(例如我在这里选择的示例)有多种发音。这是一个棘手的问题。根据您想要的精度,您可能可以只使用列出的第一个罗马字母,因为它们是按频率递减的顺序排列的。 (实际上,这是 kHanyuPinyin 与 kMandarin 有点不同的地方之一——它实际上有多个发音列表,每个列表都按频率排序。)

I don't know if Google AJAX Language APIs have support for converting to pinyin, but if they don't it actually isn't too hard to do a passable conversion on your on. (The reverse conversion, from pinyin to hanzi (characters) is much more tricky, because pinyin is very lossy.)

To do the conversion yourself, grab the Unihan.zip, a downloaable verion of the Unihan database. The file you actually care about is Unihan_Readings.txt. It also contains a bunch of stuff you don't care about, and it's also stored in a pretty inefficient way, so don't be too worried about the large file sizes. You should extract the stuff you care about and store it in a more efficient way.

In it you'll find tab-delimited lines like this:

U+597D  kCantonese      hou2 hou3
U+597D  kDefinition     good, excellent, fine; well
U+597D  kHangul         호
U+597D  kHanyuPinlu     hao3(6060) hao1(142) hao4(115)
U+597D  kHanyuPinyin    21028.010:hǎo,hào
U+597D  kJapaneseKun    KONOMU SUKU YOI
U+597D  kJapaneseOn     KOU
U+597D  kKorean         HO
U+597D  kMandarin       HAO3 HAO4
U+597D  kTang           *xɑ̀u *xɑ̌u
U+597D  kVietnamese     háo
U+597D  kXHC1983        0445.030:hǎo 0448.030:hào

The left column ("U+597D") is the unicode codepoint, the middle column is an attribute name, and the right column is the attribute value. You can extract either the kHanyuPinyin attributes or the kMandarin attributes. They encode basically the same information -- just go with whichever is an easier format for you to deal with. (hǎo == HAO3, hào == HAO4, if that isn't obvious)

You'll note that for some characters (like the example I've chosen here) there are multiple pronunciations. This is the one tricky bit. Depending on how much precision you want, you may be able to get away with just using the first romanization listed, as they're in order of decreasing frequency. (Actually, this is one of the places where kHanyuPinyin is a bit different from kMandarin -- it actually has multiple lists of pronunciations, each ordered by frequency.)

爱的故事 2024-08-23 22:24:33

您可以欺骗 API 通过将中文翻译成中文来为您提供拼音。示例 链接

You can trick the API into giving you Pinyin by translating from Chinese to Chinese. Sample link.

爱你是孤单的心事 2024-08-23 22:24:33

谷歌翻译包括“显示/隐藏罗马化”,这比 UNIHAN 更好,原因有两个。首先,已知的单词以适当的方式在逻辑上分组在一起(至少它试图这样做)。其次,汉字有不止一种可能的读音。找出哪个拼音音译才是正确的并不是一个小问题。这就是翻译引擎的作用。

Google translate includes "show/hide romanization" which is BETTER than UNIHAN for two reasons. First, known words are logically grouped together in the proper manner (at least it tries to do that). Secondly, Chinese characters have more than one possible pronunciation. It is not a trivial problem to figure out which pinyin transliteration is the right one. That's what the translation engine does.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文