可以在单词发音游戏中使用WIndows语音识别引擎吗?

发布于 2024-09-02 02:33:05 字数 436 浏览 13 评论 0原文

我用来创建一个使用 Windows 语音识别引擎或 SAPI 的应用程序。它就像一个发音游戏,当你正确发音时它会给你分数。但是当我开始使用 SAPI 进行实验时,它的识别效果很差,除非您在其上加载语法(XML),否则它会给出最佳的识别结果。

但现在的问题是,将识别与输入文本最接近的发音。 例如:

数据库 ->德德贝斯 ->正确的。

即使你发音错误。它会给你正确的答案。

当您说数据库时,如果不使用 xml 语法,

它会给您“在基础/基础/数据库/等...”,

请发布您的答案、建议、说明。投票选出最佳答案。

is it possible or not?

顺便说一句,我在项目中使用delphi编译器......

I use to create an application that uses the windows speech recognition engine or the SAPI. its like a game for pronunciation that it give you score when you pronounce it correctly. but when i started experiments with SAPI, it has poor recognition unless if you load a grammar on it (XML) its give best recognition result.

but the problem now is closest pronunciation from the input text will be recognize.
for example:

Database -> dedebase -> correct.

even if you mispronounce it. it gives you correct answers.

without using the xml grammar

when you say database it give you "in the base/the base/data base/etc..."

please post your answer,suggestion,clarification. votes for best answer.

is it possible or not?

by the way i use delphi compiler on the projects....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苦妄 2024-09-09 02:33:05

我会做两件事:

  1. 使用 ISpEnginePronunciation::GetPronunciations
  2. 使用听写语法和发音语言模型强制 SAPI 返回一组音素 - 通过调用 ISpRecoGrammar::LoadDictation(L"发音", SPLO_STATIC)。
  3. 将识别出的音素与目标音素进行比较。

请注意,ISpEnginePronunciation 在 SAPI 5.1 上不可用,因此仅限于 Vista 和 Windows 7。

I'd do two things:

  1. Convert the original text to phonemes by using ISpEnginePronunciation::GetPronunciations.
  2. Use a dictation grammar and the pronunciation language model to force SAPI to give you back a set of phonemes - do this by calling ISpRecoGrammar::LoadDictation(L"Pronunciation", SPLO_STATIC).
  3. Compare the recognized phonemes to the target phonemes.

Note that ISpEnginePronunciation isn't available on SAPI 5.1, so this is limited to Vista and Windows 7.

提赋 2024-09-09 02:33:05

对于你想要的,最好不要使用语法。但它要求用户对语音识别引擎进行“最低限度”的基础训练。时间不是很长,而且相对愉快。它确实对识别准确性产生了影响(相信我,我的英语有很重的法国口音)。
它甚至可以作为游戏本身的初步练习。
您可能会觉得在 CodeRage 4 会议上观看此活动很有趣cc.embarcadero.com/download.aspx?id=27264" rel="nofollow noreferrer">"支持语音的 Delphi 应用程序 (zip)"

For what you want, it is probably best not to use a grammar. But it requires that the users do the "minimal" basic training of the speech recognition engine. It's not very long and relatively pleasant. And it really makes a difference on the recognition accuracy (believe me, I have a strong French accent in my English).
It can even be included as a preliminary practice for the game itself.
You may find interesting to see this CodeRage 4 session on "Speech Enabling Delphi Applications (zip)"

又爬满兰若 2024-09-09 02:33:05

如果游戏的目的是鼓励用户使用给定语言(例如 EN-US)最接近“标准发音”的发音说话,那么让用户训练识别器以适应用户的特定(未修改)言语模式可能会适得其反。您将在一定程度上训练识别器,使其更加宽容用户的发音错误。

无论您最终使用基于语法的识别还是基于听写的识别(埃里克·布朗的帖子看起来非常有前途),您可能还想研究“置信度”分数。这些分数在执行识别后即可获得,它们给出了一个数值,表示识别器对用户实际所说内容与识别器认为用户所说内容相匹配的置信度。根据识别器配置和用例,置信度分数可能有意义,也可能没有意义。

如果您的准确度分数基于音素/音素/发音的文本表示,那么获得准确度分数的一种快速而简单的方法是使用 Levenshtein 距离,这是一种算法,网上有许多免费的实现。更好的评分算法可能是重新同步差异,比较的原子单位是单个手机。

以下是 MSDN 文档搜寻的一些关键字:
ISpRecoResult ->;获取短语 -> SPPHRASE->规则-> SPPHRASERULE-> SREngine信心。

http://msdn.microsoft.com/en -us/library/ee413319%28v=vs.85%29.aspx
http://msdn.microsoft.com/en -us/library/ms720460%28v=VS.85%29.aspx

If the point of the game is to encourage the user to speak using pronunciation that is closest to "standard pronunciation" for a given language (e.g. EN-US), then having the user train the recognizer to adapt to the user's particular (unmodified) speech patterns may be counterproductive. You would in part be training the recognizer to be more forgiving of the user's pronunciation lapses.

Whether you end up using grammar-based recognition or dictation-based recognition (Eric Brown's post looks very promising), you will probably also want to look into "confidence" scores. These scores are available after a recognition has been performed, and they give a numeric value to how confident the recognizer is that what the user actually said matches what the recognizer thinks the user said. Depending on the recognizer configuration and use case, confidence scores may or may not be meaningful.

If you are basing your accuracy score off of the textual representation of the phones/phonemes/pronunciation, a quick and easy way to get an accuracy score would be to use Levenshtein distance, an algorithm for which there are many implementations freely available on the net. A better scoring algorithm might be a resynchronizing diff, with the atomic unit of comparison being single phones.

Here are some keywords for MSDN doc hunting:
ISpRecoResult -> GetPhrase -> SPPHRASE -> Rule -> SPPHRASERULE -> SREngineConfidence.

http://msdn.microsoft.com/en-us/library/ee413319%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/ms720460%28v=VS.85%29.aspx

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文