将输入流式传输到 System.Speech.Recognition.SpeechRecognitionEngine
我正在尝试从 TCP 套接字在 C# 中进行“流式”语音识别。我遇到的问题是 SpeechRecognitionEngine.SetInputToAudioStream() 似乎需要一个可以查找的定义长度的流。现在,我能想到的实现这项工作的唯一方法是,随着更多输入的到来,在 MemoryStream 上重复运行识别器。
下面是一些代码来说明:
SpeechRecognitionEngine appRecognizer = new SpeechRecognitionEngine();
System.Speech.AudioFormat.SpeechAudioFormatInfo formatInfo = new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono);
NetworkStream stream = new NetworkStream(socket,true);
appRecognizer.SetInputToAudioStream(stream, formatInfo);
// At the line above a "NotSupportedException" complaining that "This stream does not support seek operations."
有人知道如何解决这个问题吗?它必须支持某种类型的流输入,因为它可以使用 SetInputToDefaultAudioDevice() 与麦克风一起正常工作。
谢谢,肖恩
I am trying to do "streaming" speech recognition in C# from a TCP socket. The problem I am having is that SpeechRecognitionEngine.SetInputToAudioStream() seems to require a Stream of a defined length which can seek. Right now the only way I can think to make this work is to repeatedly run the recognizer on a MemoryStream as more input comes in.
Here's some code to illustrate:
SpeechRecognitionEngine appRecognizer = new SpeechRecognitionEngine();
System.Speech.AudioFormat.SpeechAudioFormatInfo formatInfo = new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono);
NetworkStream stream = new NetworkStream(socket,true);
appRecognizer.SetInputToAudioStream(stream, formatInfo);
// At the line above a "NotSupportedException" complaining that "This stream does not support seek operations."
Does anyone know how to get around this? It must support streaming input of some sort, since it works fine with the microphone using SetInputToDefaultAudioDevice().
Thanks, Sean
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我通过重写流类来实现实时语音识别:
...并使用它的实例作为 SetInputToAudioStream 方法的流输入。一旦流返回长度或返回的计数小于请求的计数,识别引擎就会认为输入已完成。这将设置一个永远不会完成的循环缓冲区。
I got live speech recognition working by overriding the stream class:
... and using an instance of that as the stream input to the SetInputToAudioStream method. As soon as the stream returns a length or the returned count is less than that requested the recognition engine thinks the input has finished. This sets up a circular buffer that never finishes.
您是否尝试过将网络流包装在 System.IO.BufferedStream 中?
Have you tried wrapping the network stream in a System.IO.BufferedStream?
显然这是不可能完成的(“按设计”!)。请参阅 http://social.msdn。 microsoft.com/Forums/en/netfxbcl/thread/fcf62d6d-19df-4ca9-9f1f-17724441f84e
Apparently it can't be done ("By design"!). See http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/fcf62d6d-19df-4ca9-9f1f-17724441f84e
这是我的解决方案。
使用方法:
This is my solution.
How to Use:
我最终缓冲了输入,然后将其以连续更大的块发送到语音识别引擎。例如,我可能会首先发送前 0.25 秒,然后发送前 0.5 秒,然后发送前 0.75 秒,依此类推,直到得到结果。我不确定这是否是最有效的方法,但它给我带来了满意的结果。
祝你好运,肖恩
I ended up buffering the input and then sending it to the speech recognition engine in successively larger chunks. For instance, I might send at first the first 0.25 seconds, then the first 0.5 seconds, then the first 0.75 seconds, and so on until I get a result. I am not sure if this is the most efficient way of going about this, but it yields satisfactory results for me.
Best of luck, Sean