System.Speech.Recognition 和 Microsoft.Speech.Recognition 之间有什么区别?
.NET 中有两个类似的用于语音识别的命名空间和程序集。我试图了解其中的差异以及何时适合使用其中之一。
System.Speech.Recognition 来自程序集 System.Speech(在 System.Speech.dll 中)。 System.Speech.dll是.NET Framework类库3.0及更高版本中的核心DLL。
还有来自程序集Microsoft.Speech(在microsoft.speech.dll中)的Microsoft.Speech.Recognition。 Microsoft.Speech.dll 是 UCMA 2.0 SDK 的一部分,
我发现文档令人困惑,我有以下问题:
System.Speech.Recognition 说它用于“Windows 桌面语音技术”,这是否意味着它不能在服务器操作系统或无法用于大规模应用程序?
UCMA 2.0 语音 SDK ( http://msdn .microsoft.com/en-us/library/dd266409%28v=office.13%29.aspx)表示它需要 Microsoft Office Communications Server 2007 R2 作为先决条件。然而,我在会议上被告知,如果我不需要 OCS 功能(例如状态和工作流程),我可以使用 UCMA 2.0 语音 API,而不使用 OCS。这是真的吗?
如果我正在为服务器应用程序构建一个简单的识别应用程序(假设我想自动转录语音邮件)并且我不需要 OCS 的功能,那么这两个 API 之间有什么区别?
There are two similar namespaces and assemblies for speech recognition in .NET. I’m trying to understand the differences and when it is appropriate to use one or the other.
There is System.Speech.Recognition from the assembly System.Speech (in System.Speech.dll). System.Speech.dll is a core DLL in the .NET Framework class library 3.0 and later
There is also Microsoft.Speech.Recognition from the assembly Microsoft.Speech (in microsoft.speech.dll). Microsoft.Speech.dll is part of the UCMA 2.0 SDK
I find the docs confusing and I have the following questions:
System.Speech.Recognition says it is for "The Windows Desktop Speech Technology", does this mean it cannot be used on a server OS or cannot be used for high scale applications?
The UCMA 2.0 Speech SDK ( http://msdn.microsoft.com/en-us/library/dd266409%28v=office.13%29.aspx ) says that it requires Microsoft Office Communications Server 2007 R2 as a prerequisite. However, I’ve been told at conferences and meetings that if I do not require OCS features like presence and workflow I can use the UCMA 2.0 Speech API without OCS. Is this true?
If I’m building a simple recognition app for a server application (say I wanted to automatically transcribe voice mails) and I don’t need features of OCS, what are the differences between the two APIs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
简而言之,Microsoft.Speech.Recognition 使用服务器版本的 SAPI,而 System.Speech.Recognition 使用桌面版本的 SAPI。
API 大部分相同,但底层引擎不同。通常,服务器引擎被设计为接受电话质量的音频以进行命令和操作。控制应用;桌面引擎旨在为命令和命令接受更高质量的音频。控制和听写应用程序。
您可以在服务器操作系统上使用 System.Speech.Recognition,但其设计的扩展性不如 Microsoft.Speech.Recognition。
不同之处在于,服务器引擎不需要训练,并且可以处理较低质量的音频,但识别质量比桌面引擎低。
The short answer is that Microsoft.Speech.Recognition uses the Server version of SAPI, while System.Speech.Recognition uses the Desktop version of SAPI.
The APIs are mostly the same, but the underlying engines are different. Typically, the Server engine is designed to accept telephone-quality audio for command & control applications; the Desktop engine is designed to accept higher-quality audio for both command & control and dictation applications.
You can use System.Speech.Recognition on a server OS, but it's not designed to scale nearly as well as Microsoft.Speech.Recognition.
The differences are that the Server engine won't need training, and will work with lower-quality audio, but will have a lower recognition quality than the Desktop engine.
我发现 Eric 的回答非常有帮助,我只是想添加一些我发现的更多详细信息。
System.Speech.Recognition 可用于对桌面识别器进行编程。产品中已附带 SAPI 和桌面识别器:
服务器带有 SAPI,但没有识别器:
桌面识别器也出现在 Office 等产品中。
Microsoft.Speech.Recognition 可用于对服务器识别器进行编程。服务器识别器已在产品中提供:
Microsoft Server Speech Platform 10.2 版本的完整 SDK 位于 http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4。语音引擎可免费下载。版本 11 现已发布:http://www.microsoft.com/download /en/details.aspx?id=27226。
有关 Microsoft Speech Platform SDK 11 信息和下载,请参阅:
http://www.microsoft.com/en-us/download /details.aspx?id=27224
引擎 - http://www.microsoft.com/en-us /download/details.aspx?id=27225
http://www.microsoft.com/en-us/download /details.aspx?id=27226
桌面识别器设计用于运行 inproc 或共享。共享识别器在桌面上非常有用,其中使用语音命令来控制任何打开的应用程序。服务器识别器只能在进程内运行。当单个应用程序使用识别器或需要识别 wav 文件或音频流时(共享识别器无法处理音频文件,只能处理来自输入设备的音频),请使用 Inproc 识别器。
仅桌面语音识别器包含听写语法(系统提供用于自由文本听写的语法)。 System.Speech.Recognition.DictationGrammar 类在 Microsoft.Speech 命名空间中没有补充。
您可以使用 API 来查询确定已安装的识别器
我发现我还可以查看安装了哪些识别器查看注册表项:
--- 更新 ---
如 < a href="https://stackoverflow.com/questions/10002591/microsoft-speech-recognition-what-reference-do-i-have-to-add">Microsoft 语音识别 - 我必须添加什么参考?< /a>,Microsoft.Speech 也是用于 Kinect 识别器的 API。这在 MSDN 文章 http://msdn.microsoft.com/en-我们/library/hh855387.aspx
I found Eric’s answer really helpful, I just wanted to add some more details that I found.
System.Speech.Recognition can be used to program the desktop recognizers. SAPI and Desktop recognizers have shipped in the products:
Servers come with SAPI, but no recognizer:
Desktop recognizers have also shipped in products like office.
Microsoft.Speech.Recognition can be used to program the server recognizers. Server recognizers have shipped in the products:
The complete SDK for the Microsoft Server Speech Platform 10.2 version is available at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4. The speech engine is a free download. Version 11 is now available at http://www.microsoft.com/download/en/details.aspx?id=27226.
For Microsoft Speech Platform SDK 11 info and downloads, see:
http://www.microsoft.com/en-us/download/details.aspx?id=27224
engine - http://www.microsoft.com/en-us/download/details.aspx?id=27225
http://www.microsoft.com/en-us/download/details.aspx?id=27226
Desktop recognizers are designed to run inproc or shared. Shared recognizers are useful on the desktop where voice commands are used to control any open applications. Server recognizers can only run inproc. Inproc recognizers are used when a single application uses the recognizer or when wav files or audio streams need to be recognized (shared recognizers can’t process audio files, just audio from input devices).
Only Desktop speech recognizers include a dictation grammar (system provided grammar used for free text dictation). The class System.Speech.Recognition.DictationGrammar has no complement in the Microsoft.Speech namespace.
You can use use the APIs to query determine your installed recongizers
I found that I can also see what recognizers are installed by looking at the registry keys:
--- Update ---
As discussed in Microsoft Speech Recognition - what reference do I have to add?, Microsoft.Speech is also the API used for the Kinect recognizer. This is documented in the MSDN article http://msdn.microsoft.com/en-us/library/hh855387.aspx
以下是语音库(MS Server 语音平台)的链接:
Microsoft Server 语音平台 10.1 已发布(26 种语言的 SR 和 TTS)
Here is the link for the Speech Library (MS Server Speech Platform):
Microsoft Server Speech Platform 10.1 Released (SR and TTS in 26 languages)
微软似乎写了一篇文章来澄清有关 Microsoft 语音平台和 Windows SAPI 之间差异的问题 - https://msdn.microsoft.com/en-us/library/jj127858.aspx。
我在将 Kinect 的语音识别代码从 Microsoft.Speech 转换为 System.Speech 时发现了一个差异(请参阅 http://github .com/birbilis/Hotspotizer)是前者支持带有 tag-format=semantics/1.0-literals 的 SGRS 语法,而后者则不支持,您必须通过将 x 更改为 out= 来转换为语义/1.0 “x”;在标签处
Seems Microsoft wrote an article that clears things up regarding the differences between Microsoft Speech Platform and Windows SAPI - https://msdn.microsoft.com/en-us/library/jj127858.aspx.
A difference I found myself while converting Speech recognition code for Kinect from Microsoft.Speech to System.Speech (see http://github.com/birbilis/Hotspotizer) was that the former supports SGRS grammars with tag-format=semantics/1.0-literals, while the latter doesn't and you have to convert to semantics/1.0 by changing x to out="x"; at tags