C# 语音识别 - 这是用户所说的吗?
我需要编写一个使用语音识别引擎的应用程序(无论是内置的 vista 引擎还是第三方引擎),它可以显示单词或短语,并在用户读取它时进行识别(或其近似值) )。 我还需要能够在语言之间快速切换,而不改变操作系统的语言。
用户将使用该系统的时间很短。 该应用程序无需首先根据用户的声音训练识别引擎即可运行。
如果这可以在 Windows XP 或较低版本的 Windows Vista 上运行,那就太棒了。
或者,系统需要能够以用户选择的语言将屏幕上的信息读回给用户。 我可以使用预先录制的画外音来解决此规范,但首选方法是使用文本转语音引擎。
有人可以给我推荐一些东西吗?
I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and recognise when the user reads it (or an approximation of it). I also need to be able to switch quickly between languages, without changing the language of the operating system.
The users will be using the system for very short periods. The application needs to work without the requirement of first training the recognition engine to the users' voices.
It would also be fantastic if this could work on Windows XP or lesser versions of Windows Vista.
Optionally, the system needs to be able to read information on the screen back to the user, in the user's selected language. I can work around this specification using pre-recorded voice-overs, but the preferred method would be to use a text-to-speech engine.
Can anyone recommend something for me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
如果引擎是您要问的,那么我已经找到了(注意,我只是列出,我还没有尝试过其中任何一个):
Lumenvox 引擎
您还拥有 SAPI SDK 来自微软本身,我只尝试过文本转语音,但根据其定义:
该 SDK 还包括可自由分发的文本转语音(TTS) 引擎(美国英语和简体中文)和语音识别 (SR) 引擎(美国英语、简体中文和日语)。
If the engine is what you're asking about then I've found (beware, I'm just listing, I haven't tried any of them):
Lumenvox engine
you also have the SAPI SDK from Microsoft itself, I've only tried it for text to speech but according to its definition:
The SDK also includes freely distributable text-to-speech (TTS) engines (in U.S. English and Simplified Chinese) and speech recognition (SR) engines (in U.S. English, Simplified Chinese, and Japanese).
请注意,如果不首先进行培训,您将不会获得良好的结果。 语音识别是语音学的统计应用,该领域非常坦率地承认信号的变化如此之大,以至于任何人都能理解其他人所说的话几乎是一个奇迹。 现成的语音识别引擎很可能会倾向于更通用的英语口音,但对于任何轻微的不同都会严重失败。
这就是为什么培训如此重要。 我们可以轻松地通过过度拟合来做得很好,特别是如果我们减少问题空间的话。 但是创建一个可扩展的机器学习解决方案呢? 问题始终存在于此。
话虽这么说,请考虑 Sphinx-4。 这是一个用 Java 编写的现成解决方案,可在 http://cmusphinx.sourceforge.net/sphinx4/ 获取
Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.
That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.
That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/
查看 .NET 3.5 中的新语音类库
SR 和 TTS 的一般文档
Check out the new Speech class libraries in .NET 3.5
general documentation for SR and TTS
文本转语音可通过 语音 API 实现。 就我个人而言,我可能需要 Vista 并使用托管接口 System.Speech.SpeechRecognition 和 System. Speech.Synthesis.TtsEngine,但如果您确实需要 XP 支持,则 P/Invoke 应该可以进入非托管 API。
Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.
尝试 Microsoft Speech Server,我认为它现在是 Office Communication Server 2007。 它包含 SR/TTS 引擎、C# API 以及与 Visual Studio 集成的工具。
Try Microsoft Speech Server, which I think now is part of Office Communication Server 2007. It contains a SR/TTS engines, C# API and tools that integrate with Visual Studio.
这是来自 MSDN 杂志的文章,首次讨论了使用 Vista 的 System.Speech API。 其中一些内容已经过时,因为 API 在 beta(撰写本文时)和 Vista 发布之间发生了变化,但这仍然是我找到的最好的资源之一,并且涵盖了对 System.Speech 命名空间的很好的介绍。 请参阅 http://msdn.microsoft.com/en-us/magazine/cc163663 .aspx
This is the article from MSDN magazine that first discussed using the System.Speech APIs for Vista. Some of it is out of date because the API changed between beta (when the article was written) and the release of Vista, but this is still one of the best resources I've found and covers a good intro to the System.Speech namespace. See http://msdn.microsoft.com/en-us/magazine/cc163663.aspx
Dragon Naturally Talking SDK 可能值得一看。
这个项目看起来很有趣。
不过还没有和他们两个一起玩。
Dragon Naturally Speaking SDK might be worth looking at.
This project looked interesting.
Haven't got to play with either of them though.
好吧,这个问题已经有很多很好的回答,但我认为用 2016 年文档中的一些信息更新 Rob Segal 和 Philipp Schmid 的回答很有价值,他们指出了这个很好的代码示例:
https://msdn.microsoft.com/en-us/library/office/system.speech.recognition .speechrecognitionengine.aspx
它没有使用Windows的共享识别器(显示在屏幕中间的小Windows麦克风),它使用了一个很好的应用程序中的SpeechRecognitionEngine,不需要任何视觉提示。 用户界面完全由您控制。
Well, this question already has many good responses but I think it is valuable to update with some info from 2016 documentation the responses from Rob Segal and Philipp Schmid pointing to this nice code example:
https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx
It did not use the shared recognizer of Windows (The little Windows Mic that shows out up in the middle of the screen), it use a nice in app SpeechRecognitionEngine that not need any visual cue. The UI is completly at your control.
不久前,Joel 在 Software 上也被问到了类似的问题。 您可以使用 System.Speech.Recognition 命名空间来这样做......有一些限制。 将 System.Speech(应位于 GAC 中)添加到您的项目中。 下面是 WinForms 应用程序的一些示例代码:
它识别从 1 到 100 的数字,并在表单上显示结果数字。 您需要一个带有名为 lblLetter 标签的表单。
System.Speech 仅适用于预定义的单词或短语列表; 无论是在多功能性还是在识别质量方面,它都不完全是自然语言。 但是您不必将其训练为用户的声音,并且如果您只有用户可以说的一些不同的内容,那么它的工作效果相当好。 而且是免费的! (如果你有 Visual Studio)
如果你使用非常短的短语,它不会很好地工作; 我为我的孩子制作了一个程序,让他说出字母表中的字母并在屏幕上看到它们,但效果并不好,因为许多字母听起来很相似(尤其是从一个四岁孩子的嘴里说出来)。
至于更灵活的选项......嗯,有前面提到的 NaturallySpeaking,它有一个 SDK。 但你必须联系销售人员才能获得任何形式的访问权限,并且没有列出价格,因此它给人的印象是“它要花多少钱?那么,你有多少钱?”之一。 之类的事情。 似乎没有“下载并使用它”选项。 :(
至于文本转语音, System.Speech.Synthesis 比语音识别更容易。我编写了一个小程序,让我输入、按下 Enter 键并大声朗读文本:)(“爸爸。 ,我想跟 da wobot 说话。”)
A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:
This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.
System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)
It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).
As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(
As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")
[注:我是 .NET 3.0 中托管语音识别 API 的开发负责人]
System.Speech 是 .NET 3.0 的一部分,因此它在 Vista 和 XP 上都可用。 在 Vista 中,您还有一个额外的好处:操作系统预装了语音识别引擎。 在 XP 上,您的选择是:使用带有非常旧的引擎的 SAPI 5.1 SDK(但可能足以满足您的命令和控制场景),安装 Office 2003,它会安装较新版本的识别器。 还有一些兼容 SAPI 5 的语音识别引擎可用。
如果您需要切换语言,您将需要使用 System.Speech.Recognition.SpeechRecognitionEngine 类,它允许您为需要支持的语言选择 SR 引擎。 请注意,引擎是由它们支持的一组语言定义的(它们可能使用相同的二进制文件,仅交换数据文件以支持其他语言)。
如果您需要了解更多信息,请评论。
菲利普
[Note: I was the development lead for the managed speech recognition API in .NET 3.0]
System.Speech is part of .NET 3.0, so it is available on both Vista and XP. In Vista you have the added benefit of having a speech recognition engine pre-installed by the OS. On XP you choices are: use the SAPI 5.1 SDK with a very old engine (but might work well enough for your command and control scenario), install Office 2003 which installs a newer version of the recognizer. There are a few SAPI 5 complient speech recognition engines available as well.
If you need to switch languages, you will want to use the System.Speech.Recognition.SpeechRecognitionEngine class which allows you to choose the SR engine for the language you need to support. Note that engines are defined by a set of languages they support (they might be using the same binary, only swapping data files to support additional languages).
Comment if you need to know more.
Philipp
在此之前添加“语音”参考
发现 Kyralessa 在 10 月 22 日发布的代码示例不适用于我但稍微修改过的版本做到了。 将字符串添加到 Choices 对象时,请使用全文英文单词而不是数字。 看来微软的语音识别引擎不能自己识别数字。
我已在前面的示例中添加了一些注释来标记这些修改。
Before this add 'Speech' reference
Found that the code example posted by Kyralessa on Oct 22nd didn't work for me but a slightly revised version did. When adding strings into the Choices object use full text English words not numbers. Seems the MS speech recognition engine can't recognize numbers by themselves.
I have marked these modifications with some commenting added to the previous example.