使用 SpeechSynthesizer 通过 SpeechAudioFormatInfo 进行 TTS 流传输

发布于 2024-09-26 17:32:38 字数 1899 浏览 2 评论 0原文

我正在使用 System.Speech.Synthesis.SpeechSynthesizer 将文本转换为语音。由于微软的文档贫乏（请参阅我的链接，没有注释或代码示例），我很难弄清楚两种方法之间的区别：

SetOutputToAudioStream 和 SetOutputToWaveStream。

以下是我的推论：

SetOutputToAudioStream 采用一个流和一个 SpeechAudioFormatInfo 实例，该实例定义波形文件的格式（每秒采样数、每秒位数、音频通道等）并将文本写入流。

SetOutputToWaveStream 仅接受一个流并将一个 16 位、单声道、22kHz、PCM 波形文件写入该流。无法传入 SpeechAudioFormatInfo。

我的问题是 SetOutputToAudioStream 没有将有效的波形文件写入流中。例如，当将流传递到 System.Media.SoundPlayer 时，我收到 InvalidOperationException（“波形头已损坏”）。如果我将流写入磁盘并尝试使用 WMP 播放它，则会收到“Windows Media Player 无法播放文件...”错误，但 SetOutputToWaveStream 写入的流在两者中都能正常播放。我的理论是 SetOutputToAudioStream 没有写入（有效）标头。

奇怪的是，SetOutputTo*Blah* 的命名约定不一致。 SetOutputToWaveFile 采用 SpeechAudioFormatInfo，而 SetOutputToWaveStream 则不采用。

我需要能够将 8kHz、16 位、单声道波形文件写入流，而 SetOutputToAudioStream 或 SetOutputToWaveStream 都不允许我这样做。有人深入了解 SpeechSynthesizer 和这两种方法吗？

作为参考，这里有一些代码：

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  synth.SelectVoice(voiceName);
  synth.SetOutputToWaveStream(ret);
  //synth.SetOutputToAudioStream(ret, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
  synth.Speak(textToSpeak);
}

解决方案：

非常感谢@Hans Passant，这是我现在使用的要点：

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
  var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
  mi.Invoke(synth, new object[] { ret, fmt, true, true });
  synth.SelectVoice(voiceName);
  synth.Speak(textToSpeak);
}
return ret;

对于我的粗略测试，它效果很好，尽管使用反射有点恶心，但它比将文件写入磁盘并打开一个流。

原文

I am using System.Speech.Synthesis.SpeechSynthesizer to convert text to speech. And due to Microsoft's anemic documentation (see my link, there's no remarks or code examples) I'm having trouble making heads or tails of the difference between two methods:

SetOutputToAudioStream and SetOutputToWaveStream.

Here's what I have deduced:

SetOutputToAudioStream takes a stream and a SpeechAudioFormatInfo instance that defines the format of the wave file (samples per second, bits per second, audio channels, etc.) and writes the text to the stream.

SetOutputToWaveStream takes just a stream and writes a 16 bit, mono, 22kHz, PCM wave file to the stream. There is no way to pass in SpeechAudioFormatInfo.

My problem is SetOutputToAudioStream doesn't write a valid wave file to the stream. For example I get a InvalidOperationException ("The wave header is corrupt") when passing the stream to System.Media.SoundPlayer. If I write the stream to disk and attempt to play it with WMP I get a "Windows Media Player cannot play the file..." error but the stream written by SetOutputToWaveStream plays properly in both. My theory is that SetOutputToAudioStream is not writing a (valid) header.

Strangely the naming conventions for the SetOutputTo*Blah* is inconsistent. SetOutputToWaveFile takes a SpeechAudioFormatInfo while SetOutputToWaveStream does not.

I need to be able to write a 8kHz, 16-bit, mono wave file to a stream, something that neither SetOutputToAudioStream or SetOutputToWaveStream allow me to do. Does anybody have insight into SpeechSynthesizer and these two methods?

For reference, here's some code:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  synth.SelectVoice(voiceName);
  synth.SetOutputToWaveStream(ret);
  //synth.SetOutputToAudioStream(ret, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
  synth.Speak(textToSpeak);
}

Solution:

Many thanks to @Hans Passant, here is the gist of what I'm using now:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
  var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
  mi.Invoke(synth, new object[] { ret, fmt, true, true });
  synth.SelectVoice(voiceName);
  synth.Speak(textToSpeak);
}
return ret;

For my rough testing it works great, though using reflection is a bit icky it's better than writing the file to disk and opening a stream.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

徒留西风 2024-10-03 17:32:38

您的代码片段已被破坏，您在处理后正在使用synth。但我确信这不是真正的问题。 SetOutputToAudioStream 生成原始 PCM 音频，即“数字”。没有像 .wav 文件中使用的容器文件格式（标头）。是的，不能用常规媒体程序播放。

采用 SpeechAudioFormatInfo 的 SetOutputToWaveStream 缺少重载很奇怪。在我看来，这确实是一种疏忽，尽管这种情况在 .NET 框架中极为罕见。没有令人信服的理由说明它不应该工作，底层 SAPI 接口确实支持它。可以通过反射来调用私有 SetOutputStream 方法。当我测试它时，这工作得很好，但我不能保证它：

using System.Reflection;
...
            using (Stream ret = new MemoryStream())
            using (SpeechSynthesizer synth = new SpeechSynthesizer()) {
                var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
                var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Eight, AudioChannel.Mono);
                mi.Invoke(synth, new object[] { ret, fmt, true, true });
                synth.Speak("Greetings from stack overflow");
                // Testing code:
                using (var fs = new FileStream(@"c:\temp\test.wav", FileMode.Create, FileAccess.Write, FileShare.None)) {
                    ret.Position = 0;
                    byte[] buffer = new byte[4096];
                    for (;;) {
                        int len = ret.Read(buffer, 0, buffer.Length);
                        if (len == 0) break;
                        fs.Write(buffer, 0, len);
                    }
                }
            }

如果您对黑客感到不舒服，那么使用 Path.GetTempFileName() 将其临时流式传输到文件肯定会起作用。

Your code snippet is borked, you're using synth after it is disposed. But that's not the real problem I'm sure. SetOutputToAudioStream produces the raw PCM audio, the 'numbers'. Without a container file format (headers) like what's used in a .wav file. Yes, that cannot be played back with a regular media program.

The missing overload for SetOutputToWaveStream that takes a SpeechAudioFormatInfo is strange. It really does look like an oversight to me, even though that's extremely rare in the .NET framework. There's no compelling reason why it shouldn't work, the underlying SAPI interface does support it. It can be hacked around with reflection to call the private SetOutputStream method. This worked fine when I tested it but I can't vouch for it:

using System.Reflection;
...
            using (Stream ret = new MemoryStream())
            using (SpeechSynthesizer synth = new SpeechSynthesizer()) {
                var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
                var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Eight, AudioChannel.Mono);
                mi.Invoke(synth, new object[] { ret, fmt, true, true });
                synth.Speak("Greetings from stack overflow");
                // Testing code:
                using (var fs = new FileStream(@"c:\temp\test.wav", FileMode.Create, FileAccess.Write, FileShare.None)) {
                    ret.Position = 0;
                    byte[] buffer = new byte[4096];
                    for (;;) {
                        int len = ret.Read(buffer, 0, buffer.Length);
                        if (len == 0) break;
                        fs.Write(buffer, 0, len);
                    }
                }
            }

If you're uncomfortable with the hack then using Path.GetTempFileName() to temporarily stream it to a file will certainly work.

回复收藏 0 原文

~没有更多了~