当前位置：文江博客话题详情

将带有重音符号的汉语拼音转换为数字形式

发布于 2024-10-01 11:36:38 字数 463 浏览 10 评论 0原文

我希望将用重音符号书写的拼音（例如：Nín hǎo）转换为以数字/ASCII 形式书写的拼音（例如：Nin2 hao1）。

有谁知道有什么库可以做到这一点，最好是 PHP 吗？或者懂中文/拼音可以发表评论吗？

我开始自己写一个相当简单的，但我不会说中文，也不完全理解何时应该用空格分隔单词的规则。

我能够编写一个翻译器来转换：

Nín hǎo。 Wǒ shì zhōng guó rén ==> Nin2 hao3。 Wo3 shi4 zhong1 guo2 ren2

但是你如何处理像下面这样的单词 - 它们是用空格分割成多个单词，还是在单词中插入声调数字（如果是，在哪里？）： huā shíjiān、wèishénme、yuèláiyuè、shēngbìng等。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事灯 2024-10-08 11:36:38

在没有空格分隔每个单词的情况下解析拼音的问题是会出现歧义。以中国古都的名称长安为例：Chang'ān（注意消除歧义的撇号）。但是，如果我们去掉撇号，则可以用两种方式解释：Chán gān 或 Cháng ān。中国人会告诉你，第二种可能性更大，当然取决于上下文，但你的计算机无法做到这一点。

假设没有歧义，并且所有输入都是有效的，我会这样做的方式看起来像这样：

创建重音折叠函数
创建有效拼音的数组（您应该从拼音的维基百科页面获取它）
将每个单词与有效拼音列表
当对最后一个字符属于下一个单词的可能性有歧义时，先检查下一个单词，例如：

 shēngbìng
     ^ Does this 'g' belong to the next word?

无论如何，声调的数字表示的正确位置以及代表每个声调的正确数字维基百科关于拼音的文章的这一部分很好地涵盖了重音： http://en.wikipedia.org/wiki/Pinyin #Numerals_in_place_of_tone_marks。您可能还想了解一下 IME 是如何工作的。

The problem with parsing pinyin without the space separating each word is that there will be ambiguity. Take, for instance, the name of an ancient Chinese capital 长安: Cháng'ān (notice the disambiguating apostrophe). If we strip out the apostrophe however this can be interpreted in two ways: Chán gān or Cháng ān. A Chinese would tell you that the second is far more likely, depending on the context of course, but there's no way your computer can do that.

Assuming no ambiguity, and that all input are valid, the way I would do it would look something like this:

Create accent folding function
Create an array of valid pinyin (You should take it from the Wikipedia page for pinyin)
Match each word to the list of valid pinyin
Check ahead to the next word when there is ambiguity about the possibility of the last character belonging to the next word, such as:

 shēngbìng
     ^ Does this 'g' belong to the next word?

Anyway, the correct positioning of the numerical representation of the tones, and the correct numerals to represent each accent are covered fairly well in this section of the Wikipeda article on pinyin: http://en.wikipedia.org/wiki/Pinyin#Numerals_in_place_of_tone_marks. You might also want to have a look at how IMEs do their job.

回复收藏 0 原文