C# (C++) SAPI - TTS - 如何获取正在阅读的文本的语音计时

发布于 2024-12-26 17:55:19 字数 570 浏览 1 评论 0原文

请问有人可以帮助我吗?我搜索了一些示例,如何通过 SAPI 获取有关 TTS 中语音文本的信息(我正在用 C# 编写应用程序,但不需要,SAPI 在 C++ 中是相同的,等等) 我需要的信息例如: 用户将在文本框中写入:

“这是一段文字”..

tts.Speak("This is a text"); // 这将“读取”它..

好吧,很好...但我也需要获取有关“时间”的信息..

例如:

“Th”(“This”的第一个声音(音素))在 0.01 毫秒内被“读取”..

“i”(“is”的第一个声音)在 0.5ms 内被“read”..

“e”(“Text”的第二个声音)在 1.02ms 内被“read”..

当我保存 SAPI 生成的 .wav 文件时,我需要获取有关 .wav 中的时间信息,以便后续“处理”wav 文件。

抱歉我的英语,也抱歉我对问题的描述不好,但问题是我认为非常简单,所有人都会理解它。如果没有,我会尝试再次描述问题:) ^^..

Please can anyone help me? I search some example how can i get information about speeching text in TTS through SAPI (I am programming my aplication in C# but it is not needed, SAPI is the same in C++, etc.)
Information what I need is for example:
User will write in textbox:

"This is a Text"..

tts.Speak("This is a text"); // this will "read" it..

ok, nice... but I need too get informations about "timing"..

for example:

"Th" (first sound (phoneme) of "This") was "read" in 0.01ms..

"i" (first sound of "is") was "read" in 0.5ms..

"e" (second sound of "Text") was "read" in 1.02ms..

when I save the .wav file generated by SAPI, I need to get information about the timing in the .wav for subsequent "processing" of the wav file.

Sorry for my english and sorry for my bad description of my problem but the problem is i think very simple and all will understand it. If not I will try to describe the problem again :) ^^..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

够钟 2025-01-02 17:55:19

我使用 C++ 和 SAPI 5.1 来合成语音,并让虚拟角色相应地移动嘴唇。这是一些适用于视位的代码。根据 http://msdn.microsoft 的文档.com/en-us/library/ms720164(v=vs.85).aspx,音素的工作方式相同,只不过将 SPEI_VISEME 替换为SPEI_PHONEME

DWORD WINAPI Character::sayMessage(LPVOID lpParam){
    HRESULT hres;
    try{
        ::CoInitialize(NULL);
        ThreadParam * param = (ThreadParam *)lpParam;
        wstring s = param->message;

        //first check the string for null
        if (s == L"") return false;

        //http://msdn.microsoft.com/en-us/library/ms720163(VS.85,classic).asp is my source for this
        //set up text to speech

        //get the voice associated with the character
        ISpVoice * pVoice;
        pVoice = param->sceneObject->characterVoice;

        if (pVoice != NULL){
            pVoice->Speak( NULL, SPF_PURGEBEFORESPEAK, 0 );

            SPEVENT event;
            ULONG ul;

            pVoice->SetInterest(SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM),SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM));
            pVoice->SetNotifyCallbackFunction(&eventFunction,0,0);
            pVoice->WaitForNotifyEvent(INFINITE);

            if (param->sceneObject->age == CHILD){
                s = L"<pitch middle=\"+10\">" + s + L"</pitch>";
            }

            hres = pVoice->Speak(s.c_str(),SPF_ASYNC,NULL);

            bool isDone = false;
            while(!isDone && pVoice != NULL && !FAILED(hres)){                  
                if(pVoice->GetEvents(1,&event, &ul) == S_OK){
                    if(event.eEventId==SPEI_VISEME){
                        //get the viseme
                        int vis = LOWORD(event.lParam);  //handle it however you'd like after this


                    }
                    else if(event.eEventId== SPEI_END_INPUT_STREAM){
                        isDone = true;
                        s = L"";
                        return true;
                    }
                }                   
            }
        }
    }
    catch(...){
        return false;
    }       
    return !FAILED(hres);
}

I have used C++ and SAPI 5.1 to synthesize speech and have a virtual character move its lips accordingly. Here is some code that works with visemes. According to the documentation at http://msdn.microsoft.com/en-us/library/ms720164(v=vs.85).aspx, phonemes work the same, except replace SPEI_VISEME with SPEI_PHONEME.

DWORD WINAPI Character::sayMessage(LPVOID lpParam){
    HRESULT hres;
    try{
        ::CoInitialize(NULL);
        ThreadParam * param = (ThreadParam *)lpParam;
        wstring s = param->message;

        //first check the string for null
        if (s == L"") return false;

        //http://msdn.microsoft.com/en-us/library/ms720163(VS.85,classic).asp is my source for this
        //set up text to speech

        //get the voice associated with the character
        ISpVoice * pVoice;
        pVoice = param->sceneObject->characterVoice;

        if (pVoice != NULL){
            pVoice->Speak( NULL, SPF_PURGEBEFORESPEAK, 0 );

            SPEVENT event;
            ULONG ul;

            pVoice->SetInterest(SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM),SPFEI(SPEI_VISEME)|SPFEI(SPEI_END_INPUT_STREAM));
            pVoice->SetNotifyCallbackFunction(&eventFunction,0,0);
            pVoice->WaitForNotifyEvent(INFINITE);

            if (param->sceneObject->age == CHILD){
                s = L"<pitch middle=\"+10\">" + s + L"</pitch>";
            }

            hres = pVoice->Speak(s.c_str(),SPF_ASYNC,NULL);

            bool isDone = false;
            while(!isDone && pVoice != NULL && !FAILED(hres)){                  
                if(pVoice->GetEvents(1,&event, &ul) == S_OK){
                    if(event.eEventId==SPEI_VISEME){
                        //get the viseme
                        int vis = LOWORD(event.lParam);  //handle it however you'd like after this


                    }
                    else if(event.eEventId== SPEI_END_INPUT_STREAM){
                        isDone = true;
                        s = L"";
                        return true;
                    }
                }                   
            }
        }
    }
    catch(...){
        return false;
    }       
    return !FAILED(hres);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文