无法创建多个 TTS“wav”在 C# 中使用 MS-SAPI 5.1 的文件

发布于 2024-10-06 06:08:47 字数 3588 浏览 8 评论 0原文

各位好!

我正在开发一个项目,必须使用 TTS 创建名称的 WAV 文件。

我在 Windows Server 2003 上安装了 MS-SAPI 5.1 SDK,并使用 C# 编写 TTS 程序。除了默认的 Microsoft Sam 语音之外,我还在服务器上安装了 NeoSpeech TTS 的语音。

我遇到的问题是,该程序不会生成超过 1 个可用的 WAV 文件

更具体地说,如果我向程序发送 4 个名称,程序将创建 4 个 WAV 文件。然而,只有名字被正确转换。文件大小大于 1 kb,并且该文件也可以在媒体播放器中播放。

其他 3 个文件已创建,但大小为 1 kb,并且无法在任何媒体播放器中运行。

我对 C# 和 MS-SAPI 都很陌生,但我相信我在创建代码方面做得不错。我花了几天时间试图解决这个问题,但现在我已经没有精力了。

非常感谢对此问题的任何见解。感谢您抽出时间。

这是我的代码:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;
using SpeechLib;
using System.Threading;

namespace TTS_Text_To_Wav
{
    class Gender
    {
        public static String MALE = "Male";
        public static String FEMALE = "Female";
    }

    class Languages
    {
        public static String ENGLISH = "409;9";
        public static String SPANISH = "40a";
    }

    class Vendor
    {
        public static String VOICEWARE = "Voiceware";
        public static String MICROSOFT = "Microsoft";
    }

    class SampleTTS
    {
        static void Main(string[] args)
        {
            SampleTTS processor = null;

            try
            {
                processor = new SampleTTS();

                // get unprocessed items
                ArrayList unProcessedItems = new ArrayList();
                unProcessedItems.Add("Kate");
                unProcessedItems.Add("Sam");
                unProcessedItems.Add("Paul");
                unProcessedItems.Add("Violeta");

                if (unProcessedItems != null)
                {
                    foreach (string record in unProcessedItems)
                    {
                        // convert text to wav
                        processor.ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }

        void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, String vendor, String gender, String language)
        {
            SpVoice voice = null;
            SpFileStream spFileStream = null;

            try
            {
                spFileStream = new SpFileStream();
                voice = new SpVoice();

                spFileStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                spFileStream.Open(pathToCreateWavFile, SpeechStreamFileMode.SSFMCreateForWrite, false);

                voice.Voice = voice.GetVoices("Vendor=" + vendor + ";Gender=" + gender, "Language=" + language).Item(0);
                voice.AudioOutputStream = spFileStream;
                voice.Speak(textToConvert, SpeechVoiceSpeakFlags.SVSFlagsAsync | SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
                voice.WaitUntilDone(Timeout.Infinite);
            }
            catch (Exception e)
            {
                throw new Exception("Error occured in ConvertStringToSpeechWav()\n" + e.Message);
            }
            finally
            {
                if (spFileStream != null)
                {
                    spFileStream.Close();
                }
            }
        }
    }
}

编辑:

我似乎注意到一些新行为。该代码适用于系统上的 Microsoft 语音。我似乎只有 NeoSpeech 声音才有这个问题。

这是否意味着我的代码是正确的,但声音有问题?其一,我收到了客户的声音,所以我无能为力。其次,这些是生产就绪的声音。我很确定它们经过了充分的测试,否则我们会听到很多关于它的信息。

我仍然倾向于相信我编写的代码出了问题。

还有其他可用的建议吗?我在这里正在真正解决问题,任何帮助将不胜感激。

Greetings folks!

I'm working on a project where I will have to create WAV files of names using TTS.

I have the MS-SAPI 5.1 SDK installed on a Windows Server 2003 and use C# to write the TTS program. Apart from the default Microsoft Sam voice, I have voices from NeoSpeech TTS installed on the server.

The issue I'm having is, the program does not produce more than 1 working WAV file.

To be more specific, if I send 4 names to the program, the program creates 4 WAV files. However only the first name is converted correctly. The file size is greater than 1 kb and the file also plays in media player.

The other 3 files are created but are of size 1 kb and do not work in any media player.

I'm new to both C# and MS-SAPI but I believe I have done a decent job creating the code. I have spent days trying to figure this out but I'm out of energy now.

Any insight on this issue is greatly appreciated. Thanks for your time.

Here is my code:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;
using SpeechLib;
using System.Threading;

namespace TTS_Text_To_Wav
{
    class Gender
    {
        public static String MALE = "Male";
        public static String FEMALE = "Female";
    }

    class Languages
    {
        public static String ENGLISH = "409;9";
        public static String SPANISH = "40a";
    }

    class Vendor
    {
        public static String VOICEWARE = "Voiceware";
        public static String MICROSOFT = "Microsoft";
    }

    class SampleTTS
    {
        static void Main(string[] args)
        {
            SampleTTS processor = null;

            try
            {
                processor = new SampleTTS();

                // get unprocessed items
                ArrayList unProcessedItems = new ArrayList();
                unProcessedItems.Add("Kate");
                unProcessedItems.Add("Sam");
                unProcessedItems.Add("Paul");
                unProcessedItems.Add("Violeta");

                if (unProcessedItems != null)
                {
                    foreach (string record in unProcessedItems)
                    {
                        // convert text to wav
                        processor.ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }

        void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, String vendor, String gender, String language)
        {
            SpVoice voice = null;
            SpFileStream spFileStream = null;

            try
            {
                spFileStream = new SpFileStream();
                voice = new SpVoice();

                spFileStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                spFileStream.Open(pathToCreateWavFile, SpeechStreamFileMode.SSFMCreateForWrite, false);

                voice.Voice = voice.GetVoices("Vendor=" + vendor + ";Gender=" + gender, "Language=" + language).Item(0);
                voice.AudioOutputStream = spFileStream;
                voice.Speak(textToConvert, SpeechVoiceSpeakFlags.SVSFlagsAsync | SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
                voice.WaitUntilDone(Timeout.Infinite);
            }
            catch (Exception e)
            {
                throw new Exception("Error occured in ConvertStringToSpeechWav()\n" + e.Message);
            }
            finally
            {
                if (spFileStream != null)
                {
                    spFileStream.Close();
                }
            }
        }
    }
}

Edit:

I seem to notice some new behavior. The code works fine for Microsoft voices on the system. It is only with the NeoSpeech voices I seem to have this issue.

Does that mean my code is correct and something is wrong with the voices? For one, I got the voice from my clients so there is nothing I can do about it. Secondly these are production ready voices. I'm pretty sure they are well tested or we would have heard a lot about it.

I'm still inclined to believe something is up with the code I wrote.

Are there any other suggestions available? I'm in a real fix here and any help will be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花开雨落又逢春i 2024-10-13 06:08:47

虽然我没有看到任何导致 TTS 问题的明显原因,但您可以使用一些最佳实践和代码简化。

首先,不需要实例化包含 Main()、SampleTTS 的类来调用 ConvertStringToSpeechWav():

class SampleTTS
{
    static void Main(string[] args)
    {
        SampleTTS processor = null;

        try
        {
            processor = new SampleTTS();

Sample TTS 类可以重写如下:

class SampleTTS
{
    static void Main(string[] args)
    {
        try
        {
            // get unprocessed items
            List<String> unProcessedItems = new List<String>();
            unProcessedItems.Add("Kate");
            unProcessedItems.Add("Sam");
            unProcessedItems.Add("Paul");
            unProcessedItems.Add("Violeta");

            foreach (string record in unProcessedItems)
            {
                // convert text to wav
                ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

注意,我还更改了 ArrayList -> 中的列表; List是最佳实践,因为 List(T) 的性能比 ArrayList 更好并且类型安全。我还删除了 if (unProcessedItems != null check) 因为您已经实例化了上面的列表,因此它要么为非 null,要么引发异常。

最后,每次调用 ConvertStringToSpeechWav() 时都会创建一个新的语音对象:

voice = new SpVoice();

并让 GC 清理它。您是否尝试过像上面建议的 PauloPinto 那样调用 GC.Collect() ,只是为了看看它是否有效?您不必仅仅为了让某些东西发挥作用就必须遵守严格的编码原则。目标应该始终是干净且有原则地编写代码,但更重要的是让代码处于工作状态,然后根据需要进行重构。

我希望其中一些有所帮助。

干杯。

While I don't see anything glaring that is causing the TTS issue, there are some best practices and code simplifications you could be using.

First off, the class which includes Main(), SampleTTS doesn't need to be instantiated in order to call ConvertStringToSpeechWav():

class SampleTTS
{
    static void Main(string[] args)
    {
        SampleTTS processor = null;

        try
        {
            processor = new SampleTTS();

The Sample TTS class can be rewritten as follows:

class SampleTTS
{
    static void Main(string[] args)
    {
        try
        {
            // get unprocessed items
            List<String> unProcessedItems = new List<String>();
            unProcessedItems.Add("Kate");
            unProcessedItems.Add("Sam");
            unProcessedItems.Add("Paul");
            unProcessedItems.Add("Violeta");

            foreach (string record in unProcessedItems)
            {
                // convert text to wav
                ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

Note I also changed the list from ArrayList -> List<String> as a best practice because List(T) performs better than ArrayList and is type safe. I also removed the if (unProcessedItems != null check) as you're already instantiating the list above, so it will either be non null or throw an exception.

Lastly you're creating a new voice object each time ConvertStringToSpeechWav() is called:

voice = new SpVoice();

and letting GC clean it up. Have you tried calling GC.Collect() like PauloPinto suggested above, just to see if it works? You don't have to stick to rigid coding principles just to get something working. The goal should always be to code cleanly and with principles, but more so to get your code in a working state, and then refactoring as needed.

I hope some of this helps.

Cheers.

離人涙 2024-10-13 06:08:47

我已经有一段时间没有进行 TTS 了,但据我所知,Speak 方法是异步的,因此后续调用可能在第一个调用正在播放时被阻止。

看起来您是通过使用“SpeechVoiceSpeakFlags.SVSFlagsAsync”标志来明确执行此操作,因此请先尝试更改它。

It's been a while since I did TTS, but from what I recall the Speak method is asynchronous so the subsequent calls are probably being blocked while the first is playing.

It looks like you're doing it explicitly by using the "SpeechVoiceSpeakFlags.SVSFlagsAsync" flag, so try change that first.

☆獨立☆ 2024-10-13 06:08:47

我遇到了类似的问题,除了我使用的是来自不同供应商的语音(不是 NeoSpeech),并且该问题仅在成功生成大约 300 个左右的 wav 文件后才出现。

但症状是一样的:所有不起作用的 wav 文件大小都小于 1K。

我注意到将失败的行移动到列表顶部仍然会产生类似的结果:最初的 300 行左右成功了(即使其中一些行在上次运行中失败了)。因此,问题不在于线路本身,而在于处理量的问题。

我找不到任何方法来“重置”语音系统,因此我尝试每 100 行调用一次垃圾收集器。成功了!

因此,我建议您

尝试: GC.Collect();

在 ConvertStringToSpeechWav 函数末尾

I was having a similar issue except for the fact that I was using voices from a different vendor (not NeoSpeech) and that the problem only appeared after some 300 or so successful wav files generated.

But the symptom was the same: all wav files that didn't work were less than 1K in size.

I noticed that moving the failed lines to the top of the list still produced a similar result: the initial 300 or so lines succeeded (even though some of those lines had failed in the previous run). So the problem was not the lines themselves, but rather an issue to do with how much was being processed.

I couldn't find any way to 'reset' the speech system so I tried calling the Garbage Collector every 100 lines. It worked!

So I'd suggest you try:

GC.Collect();

at the end of your ConvertStringToSpeechWav function.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文