为什么我的 Microsoft 语音识别结果的置信度始终等于 -1?

发布于 2024-10-20 16:31:26 字数 472 浏览 3 评论 0原文

我正在使用 Microsoft Speech SDK 来实现一个使用语音识别的软件。

我向识别引擎提供了非常正常的语法,但是当启动引擎并说出正确的内容时,它会识别我所说的内容,但返回的 Result 对象的置信度值为 -1。

此外,结果中包含的所有 SemanticValue 对象也具有 -1 置信度。

我在相关的MSDN页面中找不到这样的结果的含义,实际上它只是写的是典型的置信度值应该在0和1之间。

-1值是什么意思?和语法有关系吗?

编辑:附加信息:

  • 我正在使用 System.Speech 类与语音识别引擎交互。
  • 识别引擎是Microsoft English Recognizer v5.1。
  • 我在 XP 上运行该程序,因此语音 SDK 也是 5.1。
  • 输入是麦克风输入:我没有找到向该识别引擎提供文件的可能性的踪迹,尽管它会对我有很大帮助。

I am using the Microsoft Speech SDK to implement a software using voice recognition.

I feed the recognition engine with a quite normal grammar, but when starting the engine and saying something correct, it recognizes what i say but the returned Result object has a Confidence value of -1.

Besides, all SemanticValue objects contained in the result also have a -1 confidence.

I cannot find a trace of the meaning of such a result in the related MSDN pages, and actually it is just written that typical confidence values should be between 0 and 1.

What does a -1 value mean ? Does it have something to do with the grammar?

Edit : Additional infos :

  • I am using the System.Speech classes to interact with the voice recognition engine.
  • The recognition engine is Microsoft English Recognizer v5.1.
  • I am running the program on XP and thus the Speech SDK is also 5.1.
  • The input is a microphone input: I did not find trace of the possibility to feed this recognition engine with a file, although it would have helped me a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

眼泪都笑了 2024-10-27 16:31:26

在 SAPI 中,SREngineConfidence 尝试将短语置信度从供应商特定的语音引擎传递到独立于引擎的 SAPI 客户端。 SREngineConfidence 有一些有趣的行为,请参阅“Microsoft Speech SDK Version 5.1 SR Engine Vendor Porting Guide”

http://msdn.microsoft.com/en-us/library/ee431799(v=VS.85).aspx#_Toc503606917 说:

可以进行置信度评分
要包含在的信息
识别结果。在每个短语上
元素有两个置信度
引擎可以设置的字段。这些
既有信心(三级)
字段和 SREngineConfidence
(浮点)字段。如果发动机
没有明确设置其中任何一个
值,SAPI 将尝试并产生
它们的合理默认值。它
将产生置信度值
平均每个级别
短语或属性中的单词,以及
它将设置 SREngineConfidence
值为-1.0。

,后面说:

如果未使用此字段,则
引擎将此置信度设置为 -1.0。

另一种可以为您提供一些见解的资源是 http://gotspeech.net/forums/thread/3613.aspx。一篇帖子说:

原则上,SREngineConfidence
分数是 0.0 到 1.0 之间的值
{更高的值意味着更高
信心}。但旧版本的
5.1 等 SR 引擎不支持这一点
精确地收缩,我不认为
该值确实可以与
那些引擎。只有嗨,中等,
和其他信心的低分
字段可用。

如果我没记错的话,你需要更多
最新版本的 SR 引擎,例如
Microsoft 附带的版本
Office 2003 或 Vista 以获得
中有意义的数字
SREngineConfidence 字段。

编辑:

我相信 System.Speech.Recognition 实际上是 SAPI 的 .net 包装器(请参阅 http://msdn.microsoft.com/en-us/magazine/cc163663.aspx)。我怀疑上面引用的描述置信度为 -1 的评论可能仍然适用于使用 System.Speech 的您。我猜您看到的 -1 与提到的问题相同。

我的理解是 XP 不包含识别器。 Microsoft Office 版本随之而来。所以,我不确定您真正运行的是哪个识别器引擎。您安装了Office 2003吗?或者您是否安装了像 Dragon 这样的第 3 方引擎?

你说你安装了识别器5.1。上面的 GotSpeech.NET 链接说:

但是旧版本的 SR 引擎
像5.1不履行这份合同
准确地说,我认为没有价值
确实可以与这些引擎一起使用。

我建议尝试以下操作:

还要添加一项。以下是从 wav 文件中识别的简短示例:

    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();
    Grammar myGrammar = CreatePizzaGrammar();       // uses GrammarBuilder to create a pizza ordering grammar
    myRecognizer.LoadGrammar(myGrammar);
    myRecognizer.SetInputToWaveFile("LargeCheese.wav");     // recording of ordering a pizza
    RecognitionResult result = myRecognizer.Recognize();
    string s = result.Text;
    float confidence = result.Confidence;

In SAPI the SREngineConfidence is an attempt to pass the phrase confidence from the vendor specific speech engine to the engine independent SAPI client. SREngineConfidence has some interesting behavior described in "Microsoft Speech SDK Version 5.1 SR Engine Vendor Porting Guide"

http://msdn.microsoft.com/en-us/library/ee431799(v=VS.85).aspx#_Toc503606917 says:

It is possible for confidence score
information to be included in
recognition results. On each phrase
element there are two confidence
fields that the engine can set. These
have both a Confidence (three-level)
field and an SREngineConfidence
(floating-point) field. If the engine
does not explicitly set any of these
values, SAPI will try and produce
reasonable default values for them. It
will produce the Confidence values by
averaging the levels for each of the
words in the phrase or property, and
it will set the SREngineConfidence
values to -1.0.

and later says:

If this field is not being used, the
engine sets this confidence to -1.0.

One other resource that may give you some insight is http://gotspeech.net/forums/thread/3613.aspx. One post says:

In principle, the SREngineConfidence
score is a value between 0.0 and 1.0
{higher value meaning higher
confidence}. But older versions of the
SR engines like 5.1 don't honor this
contract precisely, and I don't think
the value can really be used with
those engines. Only the Hi, Medium,
and Low scores in the other Confidence
field are usable.

If I remember rightly, you need a more
recent version of the SR engine, like
the versions that ship with Microsoft
Office 2003 or Vista to get a
meaningful number in the
SREngineConfidence field.

Edits:

I believe System.Speech.Recognition is really a .net wrapper around SAPI (see http://msdn.microsoft.com/en-us/magazine/cc163663.aspx). I suspect that the comments quoted above that describe confidence levels of -1 may still apply to you using System.Speech. I'm guessing that the -1 you are seeing is the same issue mentioned.

My understanding is that XP did not include a recognizer. Versions of Microsoft Office came with it. So, I'm not sure which recognizer engine you are really running. Do you have Office 2003 installed? or do you have a 3rd party engine like Dragon installed?

You say you have recognizer 5.1 installed. The GotSpeech.NET link above says:

But older versions of the SR engines
like 5.1 don't honor this contract
precisely, and I don't think the value
can really be used with those engines.

I would suggest trying the following:

One more piece to add. Here is a short sample to recognize from a wav file:

    SpeechRecognitionEngine myRecognizer = new SpeechRecognitionEngine();
    Grammar myGrammar = CreatePizzaGrammar();       // uses GrammarBuilder to create a pizza ordering grammar
    myRecognizer.LoadGrammar(myGrammar);
    myRecognizer.SetInputToWaveFile("LargeCheese.wav");     // recording of ordering a pizza
    RecognitionResult result = myRecognizer.Recognize();
    string s = result.Text;
    float confidence = result.Confidence;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文