当前位置：文江博客话题详情

无法创建多个 TTS“wav”在 C# 中使用 MS-SAPI 5.1 的文件

发布于 2024-10-06 06:08:47 字数 3588 浏览 10 评论 0原文

各位好！

我正在开发一个项目，必须使用 TTS 创建名称的 WAV 文件。

我在 Windows Server 2003 上安装了 MS-SAPI 5.1 SDK，并使用 C# 编写 TTS 程序。除了默认的 Microsoft Sam 语音之外，我还在服务器上安装了 NeoSpeech TTS 的语音。

我遇到的问题是，该程序不会生成超过 1 个可用的 WAV 文件。

更具体地说，如果我向程序发送 4 个名称，程序将创建 4 个 WAV 文件。然而，只有名字被正确转换。文件大小大于 1 kb，并且该文件也可以在媒体播放器中播放。

其他 3 个文件已创建，但大小为 1 kb，并且无法在任何媒体播放器中运行。

我对 C# 和 MS-SAPI 都很陌生，但我相信我在创建代码方面做得不错。我花了几天时间试图解决这个问题，但现在我已经没有精力了。

非常感谢对此问题的任何见解。感谢您抽出时间。

这是我的代码：

using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;
using SpeechLib;
using System.Threading;

namespace TTS_Text_To_Wav
{
    class Gender
    {
        public static String MALE = "Male";
        public static String FEMALE = "Female";
    }

    class Languages
    {
        public static String ENGLISH = "409;9";
        public static String SPANISH = "40a";
    }

    class Vendor
    {
        public static String VOICEWARE = "Voiceware";
        public static String MICROSOFT = "Microsoft";
    }

    class SampleTTS
    {
        static void Main(string[] args)
        {
            SampleTTS processor = null;

            try
            {
                processor = new SampleTTS();

                // get unprocessed items
                ArrayList unProcessedItems = new ArrayList();
                unProcessedItems.Add("Kate");
                unProcessedItems.Add("Sam");
                unProcessedItems.Add("Paul");
                unProcessedItems.Add("Violeta");

                if (unProcessedItems != null)
                {
                    foreach (string record in unProcessedItems)
                    {
                        // convert text to wav
                        processor.ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }

        void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, String vendor, String gender, String language)
        {
            SpVoice voice = null;
            SpFileStream spFileStream = null;

            try
            {
                spFileStream = new SpFileStream();
                voice = new SpVoice();

                spFileStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                spFileStream.Open(pathToCreateWavFile, SpeechStreamFileMode.SSFMCreateForWrite, false);

                voice.Voice = voice.GetVoices("Vendor=" + vendor + ";Gender=" + gender, "Language=" + language).Item(0);
                voice.AudioOutputStream = spFileStream;
                voice.Speak(textToConvert, SpeechVoiceSpeakFlags.SVSFlagsAsync | SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
                voice.WaitUntilDone(Timeout.Infinite);
            }
            catch (Exception e)
            {
                throw new Exception("Error occured in ConvertStringToSpeechWav()\n" + e.Message);
            }
            finally
            {
                if (spFileStream != null)
                {
                    spFileStream.Close();
                }
            }
        }
    }
}

编辑：

我似乎注意到一些新行为。该代码适用于系统上的 Microsoft 语音。我似乎只有 NeoSpeech 声音才有这个问题。

这是否意味着我的代码是正确的，但声音有问题？其一，我收到了客户的声音，所以我无能为力。其次，这些是生产就绪的声音。我很确定它们经过了充分的测试，否则我们会听到很多关于它的信息。

我仍然倾向于相信我编写的代码出了问题。

还有其他可用的建议吗？我在这里正在真正解决问题，任何帮助将不胜感激。

原文

Greetings folks!

I'm working on a project where I will have to create WAV files of names using TTS.

I have the MS-SAPI 5.1 SDK installed on a Windows Server 2003 and use C# to write the TTS program. Apart from the default Microsoft Sam voice, I have voices from NeoSpeech TTS installed on the server.

The issue I'm having is, the program does not produce more than 1 working WAV file.

To be more specific, if I send 4 names to the program, the program creates 4 WAV files. However only the first name is converted correctly. The file size is greater than 1 kb and the file also plays in media player.

The other 3 files are created but are of size 1 kb and do not work in any media player.

I'm new to both C# and MS-SAPI but I believe I have done a decent job creating the code. I have spent days trying to figure this out but I'm out of energy now.

Any insight on this issue is greatly appreciated. Thanks for your time.

Here is my code:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;
using SpeechLib;
using System.Threading;

namespace TTS_Text_To_Wav
{
    class Gender
    {
        public static String MALE = "Male";
        public static String FEMALE = "Female";
    }

    class Languages
    {
        public static String ENGLISH = "409;9";
        public static String SPANISH = "40a";
    }

    class Vendor
    {
        public static String VOICEWARE = "Voiceware";
        public static String MICROSOFT = "Microsoft";
    }

    class SampleTTS
    {
        static void Main(string[] args)
        {
            SampleTTS processor = null;

            try
            {
                processor = new SampleTTS();

                // get unprocessed items
                ArrayList unProcessedItems = new ArrayList();
                unProcessedItems.Add("Kate");
                unProcessedItems.Add("Sam");
                unProcessedItems.Add("Paul");
                unProcessedItems.Add("Violeta");

                if (unProcessedItems != null)
                {
                    foreach (string record in unProcessedItems)
                    {
                        // convert text to wav
                        processor.ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
                    }
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }

        void ConvertStringToSpeechWav(String textToConvert, String pathToCreateWavFile, String vendor, String gender, String language)
        {
            SpVoice voice = null;
            SpFileStream spFileStream = null;

            try
            {
                spFileStream = new SpFileStream();
                voice = new SpVoice();

                spFileStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                spFileStream.Open(pathToCreateWavFile, SpeechStreamFileMode.SSFMCreateForWrite, false);

                voice.Voice = voice.GetVoices("Vendor=" + vendor + ";Gender=" + gender, "Language=" + language).Item(0);
                voice.AudioOutputStream = spFileStream;
                voice.Speak(textToConvert, SpeechVoiceSpeakFlags.SVSFlagsAsync | SpeechVoiceSpeakFlags.SVSFPurgeBeforeSpeak);
                voice.WaitUntilDone(Timeout.Infinite);
            }
            catch (Exception e)
            {
                throw new Exception("Error occured in ConvertStringToSpeechWav()\n" + e.Message);
            }
            finally
            {
                if (spFileStream != null)
                {
                    spFileStream.Close();
                }
            }
        }
    }
}

Edit:

I seem to notice some new behavior. The code works fine for Microsoft voices on the system. It is only with the NeoSpeech voices I seem to have this issue.

Does that mean my code is correct and something is wrong with the voices? For one, I got the voice from my clients so there is nothing I can do about it. Secondly these are production ready voices. I'm pretty sure they are well tested or we would have heard a lot about it.

I'm still inclined to believe something is up with the code I wrote.

Are there any other suggestions available? I'm in a real fix here and any help will be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花开雨落又逢春i 2024-10-13 06:08:47

虽然我没有看到任何导致 TTS 问题的明显原因，但您可以使用一些最佳实践和代码简化。

首先，不需要实例化包含 Main()、SampleTTS 的类来调用 ConvertStringToSpeechWav()：

class SampleTTS
{
    static void Main(string[] args)
    {
        SampleTTS processor = null;

        try
        {
            processor = new SampleTTS();

Sample TTS 类可以重写如下：

class SampleTTS
{
    static void Main(string[] args)
    {
        try
        {
            // get unprocessed items
            List<String> unProcessedItems = new List<String>();
            unProcessedItems.Add("Kate");
            unProcessedItems.Add("Sam");
            unProcessedItems.Add("Paul");
            unProcessedItems.Add("Violeta");

            foreach (string record in unProcessedItems)
            {
                // convert text to wav
                ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

注意，我还更改了 ArrayList -> 中的列表; List是最佳实践，因为 List(T) 的性能比 ArrayList 更好并且类型安全。我还删除了 if (unProcessedItems != null check) 因为您已经实例化了上面的列表，因此它要么为非 null，要么引发异常。

最后，每次调用 ConvertStringToSpeechWav() 时都会创建一个新的语音对象：

voice = new SpVoice();

并让 GC 清理它。您是否尝试过像上面建议的 PauloPinto 那样调用 GC.Collect() ，只是为了看看它是否有效？您不必仅仅为了让某些东西发挥作用就必须遵守严格的编码原则。目标应该始终是干净且有原则地编写代码，但更重要的是让代码处于工作状态，然后根据需要进行重构。

我希望其中一些有所帮助。

干杯。

While I don't see anything glaring that is causing the TTS issue, there are some best practices and code simplifications you could be using.

First off, the class which includes Main(), SampleTTS doesn't need to be instantiated in order to call ConvertStringToSpeechWav():

class SampleTTS
{
    static void Main(string[] args)
    {
        SampleTTS processor = null;

        try
        {
            processor = new SampleTTS();

The Sample TTS class can be rewritten as follows:

class SampleTTS
{
    static void Main(string[] args)
    {
        try
        {
            // get unprocessed items
            List<String> unProcessedItems = new List<String>();
            unProcessedItems.Add("Kate");
            unProcessedItems.Add("Sam");
            unProcessedItems.Add("Paul");
            unProcessedItems.Add("Violeta");

            foreach (string record in unProcessedItems)
            {
                // convert text to wav
                ConvertStringToSpeechWav(record, "c:/temp/" + record + ".wav", Vendor.VOICEWARE, Gender.MALE, Languages.ENGLISH);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine(e.Message);
        }
    }

Note I also changed the list from ArrayList -> List<String> as a best practice because List(T) performs better than ArrayList and is type safe. I also removed the if (unProcessedItems != null check) as you're already instantiating the list above, so it will either be non null or throw an exception.

Lastly you're creating a new voice object each time ConvertStringToSpeechWav() is called:

voice = new SpVoice();

and letting GC clean it up. Have you tried calling GC.Collect() like PauloPinto suggested above, just to see if it works? You don't have to stick to rigid coding principles just to get something working. The goal should always be to code cleanly and with principles, but more so to get your code in a working state, and then refactoring as needed.

I hope some of this helps.

Cheers.

回复收藏 0 原文