SAPI v5.1 帮助 SpeechRecognitionEngine 总是给出与 C# 相同的错误结果

发布于 2024-11-11 04:16:30 字数 1699 浏览 5 评论 0原文

我正在研究这个 SAPI v5.1 库。所以我正在测试我拥有的示例 WAV 文件。（从此处下载）。无论如何，该文件中的声音清晰且轻松。它只包含一个单词，即第三个单词。现在，当我运行以下代码时，我得到数字 8 或“八”。如果我删除它，我会得到 7。如果我尝试随机化列表，我会得到不同的结果，依此类推。我真的很困惑，开始认为 SAPI 库中的 SpeachRecognition 根本不起作用...

无论如何，这就是我正在做的事情，

    private void button1_Click(object sender, EventArgs e)
    {
        //Add choices to grammar.
        Choices mychoices = new Choices();
        mychoices.Add("one");
        mychoices.Add("two");
        mychoices.Add("three");
        mychoices.Add("four");
        mychoices.Add("five");
        mychoices.Add("six");
        mychoices.Add("seven");
        mychoices.Add("eight");
        mychoices.Add("nine");
        mychoices.Add("zero");
        mychoices.Add("1");
        mychoices.Add("2");
        mychoices.Add("3");
        mychoices.Add("4");
        mychoices.Add("5");
        mychoices.Add("6");
        mychoices.Add("7");
        mychoices.Add("8");
        mychoices.Add("9");
        mychoices.Add("0");

        Grammar myGrammar = new Grammar(new GrammarBuilder(mychoices));

        //Create the engine.
        SpeechRecognitionEngine reco = new SpeechRecognitionEngine();

        //Read audio stream from wav file.
        reco.SetInputToWaveFile("3.wav");
        reco.LoadGrammar(myGrammar);

        //Get the recognized value.
        reco.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(reco_SpeechRecognized);

        reco.RecognizeAsync(RecognizeMode.Multiple);
    }

    void reco_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        MessageBox.Show(e.Result.Text);
    }

原文

I was playing around with this SAPI v5.1 library. So I was testing a sample WAV file I have. (Download it from here). Anyway, the sound in that file is clear and easy. It contains only one word which is number three. Now when I run the following code, I get number 8 or "eight". If I remove it, I get 7. If I try to randomize the list I get different results and so on. I'm really getting confused and started to think that SpeachRecognition in SAPI library doesn't work at all...

Anyway here is what I'm doing,

    private void button1_Click(object sender, EventArgs e)
    {
        //Add choices to grammar.
        Choices mychoices = new Choices();
        mychoices.Add("one");
        mychoices.Add("two");
        mychoices.Add("three");
        mychoices.Add("four");
        mychoices.Add("five");
        mychoices.Add("six");
        mychoices.Add("seven");
        mychoices.Add("eight");
        mychoices.Add("nine");
        mychoices.Add("zero");
        mychoices.Add("1");
        mychoices.Add("2");
        mychoices.Add("3");
        mychoices.Add("4");
        mychoices.Add("5");
        mychoices.Add("6");
        mychoices.Add("7");
        mychoices.Add("8");
        mychoices.Add("9");
        mychoices.Add("0");

        Grammar myGrammar = new Grammar(new GrammarBuilder(mychoices));

        //Create the engine.
        SpeechRecognitionEngine reco = new SpeechRecognitionEngine();

        //Read audio stream from wav file.
        reco.SetInputToWaveFile("3.wav");
        reco.LoadGrammar(myGrammar);

        //Get the recognized value.
        reco.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(reco_SpeechRecognized);

        reco.RecognizeAsync(RecognizeMode.Multiple);
    }

    void reco_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        MessageBox.Show(e.Result.Text);
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

止于盛夏 2024-11-18 04:16:30

您是如何创建 WAV 文件的？看起来比特率很高。识别器仅支持某些格式。尝试：

每个样本 8 位
单通道单声道
每秒 22,050 个样本
PCM 编码

您有大约 3 秒的音频，文件大小为 520 KB。对于支持的格式来说这似乎太大了。

您可以使用 RecognizerInfo 类查找识别器支持的音频格式 (SupportedAudioFormats) - RecognizerInfo.SupportedAudioFormats 属性。

更新：

您的音频文件有点乱。很吵。它也是不受支持的格式。 Audacity 将其报告为立体声、44.1kHz 和 32 位浮点。我消除了开头和结尾的噪音，重新采样到 22.050kHz，删除立体声轨道，然后导出为未压缩的 8 位无符号 WAV。然后就可以正常工作了。

在我的 Windows 7 计算机上，我的默认识别器仅支持以下音频格式：

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

您还应该从语法中删除数字选项。现在识别器返回两个替代值：“三”和“3”。这可能不是你想要的。您可以在语法中使用语义结果值来返回单词“三”的数字 3。

How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

8 bits per sample
single channel mono
22,050 samples per second
PCM encoding

You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer - RecognizerInfo.SupportedAudioFormats Property.

Update:

Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

On my Windows 7 machine, my default recognizer supports only the following audio formats:

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: "three" and "3". This probably isn't what you want. You could use a semantic result value in your grammar to return the number 3 for the word "three".

回复收藏 0 原文

~没有更多了~