为 Android TTS 引擎指定发音的最佳实践？

发布于 2024-09-14 05:38:16 字数 2468 浏览 7 评论 0原文

总的来说，Android 的默认文本转语音引擎（即 com.svox.pico）给我留下了深刻的印象。正如预期的那样，它会发音错误一些单词（就像我一样），因此它偶尔需要一些发音指导。因此，我想知道如何以语音方式拼出 pico TTS 引擎发音错误的单词的最佳实践。

例如，鸟 Chachalaca 的正确发音是 CHAH-chah-LAH-kah。以下是 TTS 引擎产生的内容：

mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output: chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output: CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-LAH-kah

这是我的问题。

Android TTS 引擎是否有识别的标准拼音拼写？
如果没有，是否有一些用于制作自定义发音拼写的通用规则，使拼写在未来的 TTS 引擎/版本中更有可能正确？
Android TTS 引擎似乎忽略了文本大小写。指定重点的最佳方式是什么？

顺便说一句，这是 TTS 引擎写入 logcat 的内容：

V/TtsService( 294): TTSprocessing: CHAH chah LOCKah
V/TtsService( 294): TtsService.setLanguage(eng, USA, )
I/SVOX Pico 引擎（294）：语言已加载（en-US == en-US）
I/SynthProxy( 294)：将语速设置为 100
I/SynthProxy( 294): 将音调设置为 100

[更新]

我尝试将 XML 文档传递给 TextToSpeech.speak() ，如下所示：

            String text = "<?xml version=\"1.0\"?>" +
                "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" " +
                    "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
                    "xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
                        "http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
                    "xml:lang=\"en-US\">" +

                    "That is a big car! " +
                    "That <emphasis>is</emphasis> a big car! " +
                    "That is a <emphasis>big</emphasis> car! " +
                    "That is a huge bank account! " +
                    "That <emphasis level=\"strong\">is</emphasis> a huge bank account! " +
                    "That is a <emphasis level=\"strong\">huge</emphasis> bank account!" +
                "</speak>";
            mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

正如 Android Eve 所建议的，TTS 引擎仅读取 XML 正文（即，有关大的注释）汽车和巨额银行账户）。我没有意识到 TTS 引擎能够解析 XML 文档。然而，我在 TTS 输出中没有听到任何强调。

[更新 2]

我将问题简化为 Android TTS 是否支持语音合成标记语言此处。

原文

In general, I'm very impressed with Android's default text to speech engine (i.e., com.svox.pico). As expected, it mispronounces some words (as do I) and it therefore occasionally needs some pronunciation guidance. So I'm wondering about best practices for phonetically spelling out those words that the pico TTS engine mispronounces.

For example, the correct pronunciation of the bird Chachalaca is CHAH-chah-LAH-kah. Here is what the TTS engine produces:

mTts.speak("Chachalaca", TextToSpeech.QUEUE_ADD, null); // output: chuh-KAL-uh-KUH
mTts.speak("CHAH-chah-LAH-kah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-EL-AY-AYCH-dash-kuh
mTts.speak("CHAHchahLAHkah", TextToSpeech.QUEUE_ADD, null); // output: CHA-chah-LAH-ka
mTts.speak("CHAH chah LOCKah", TextToSpeech.QUEUE_ADD, null); // output: CHAH-chah-LAH-kah

Here are my questions.

Is there a standard phonetic spelling recognized by the Android TTS engine?
If not, are there some general rules for making custom pronunciation spellings that will make the spellings more likely to be correct in future TTS engines/versions?
It appears that the Android TTS engine ignores text case. What is the best way to specify emphasis?

By the way, this is what the TTS engine writes to logcat:

V/TtsService( 294): TTS processing: CHAH chah LOCKah
V/TtsService( 294): TtsService.setLanguage(eng, USA, )
I/SVOX Pico Engine( 294): Language already loaded (en-US == en-US)
I/SynthProxy( 294): setting speech rate to 100
I/SynthProxy( 294): setting pitch to 100

[UPDATE]

I tried passing an XML document to TextToSpeech.speak() as follows:

            String text = "<?xml version=\"1.0\"?>" +
                "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" " +
                    "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" " +
                    "xsi:schemaLocation=\"http://www.w3.org/2001/10/synthesis " +
                        "http://www.w3.org/TR/speech-synthesis/synthesis.xsd\" " +
                    "xml:lang=\"en-US\">" +

                    "That is a big car! " +
                    "That <emphasis>is</emphasis> a big car! " +
                    "That is a <emphasis>big</emphasis> car! " +
                    "That is a huge bank account! " +
                    "That <emphasis level=\"strong\">is</emphasis> a huge bank account! " +
                    "That is a <emphasis level=\"strong\">huge</emphasis> bank account!" +
                "</speak>";
            mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

As Android Eve suggested, the TTS engine read only the XML body (i.e., the comments about the big car and the huge bank account). I didn't realize the TTS engine was capable of parsing XML documents. However, I did not hear any emphasis in the TTS output.

[UPDATE 2]

I simplified the question to whether or not Android TTS supports Speech Synthesis Markup Language here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅暮の光 2024-09-21 05:38:16

JW 在 tts-for-android 上回答了我的问题组：

Greg 您好，

Pico 引擎可识别带有 XSAMPA 字母表的标签。

没有简单的规则可以从拼字法中得出特定的发音，但您可以使用直观的拼写和反复试验。大写和连字符会带来比解决问题更多的问题。使用不同的拼写并引入额外的单词边界（空格）是可行的。

强调标记和感叹号不会改变合成结果。请改用、、和命令。

使用 SSML 音素标签指定发音的正确语法的一些示例位于以下 TextToSpeech 测试。

即使使用这些简单的测试 SSML 文档，也会向 logcat 发布有关 SSML 文档格式不正确的警告消息。因此，我向 Android 问题跟踪器。

为 SVOX pico 指定 x-SAMPA 序列的语法

String text = "<speak xml:lang=\"en-US\"> <phoneme alphabet=\"xsampa\" ph=\"d_ZIn\"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

为虽然更多示例会有所帮助，但 x-SAMPA 的良好参考位于 http://en.wikipedia.org/wiki/Xsampa 如果我编译了几十个示例，我会将它们发布到该维基百科页面。

JW answered my question at the tts-for-android group:

Hi Greg,

The Pico engine recognizes the tag with the XSAMPA alphabet.

There are no easy rules to derive a certain pronunciation from the orthograpy, but you can use intuitive spellings and trial and error. Capitalizing and hyphens will introduce more problems than solving them. Using different spellings and introducing extra word boundaries (spaces) can work.

The emphasis tag and the exclamation mark will not change the synthesis result. Use , , and commands instead.

Some examples of the proper syntax for specifying the pronunciation using the SSML phoneme tag are in these tests of TextToSpeech.

Even with these simple test SSML documents, there are warning messages posted to logcat about the SSML document not being well-formed. So I opened an issue about these seemingly incorrect logcat messages to the Android issue tracker.

The syntax for specifying an x-SAMPA sequence to SVOX pico is

String text = "<speak xml:lang=\"en-US\"> <phoneme alphabet=\"xsampa\" ph=\"d_ZIn\"/>.</speak>";
mTts.speak(text, TextToSpeech.QUEUE_ADD, null);

Although more examples would be helpful, a good reference for x-SAMPA is at http://en.wikipedia.org/wiki/Xsampa If I compile a couple dozen examples, I'll post them to that Wikipedia page.

回复收藏 0 原文

高冷爸爸 2024-09-21 05:38:16

所有 3 个问题的一个答案：查看 SSML 规范：http://www.w3。 org/TR/speech-synthesis/

例如，要指定强调，可以使用 emphasis 元素，例如

<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="en-US">
  That is a <emphasis> big </emphasis> car!
  That is a <emphasis level="strong"> huge </emphasis>
  bank account!
</speak>

One answer for all 3 questions: Look at the SSML specifications: http://www.w3.org/TR/speech-synthesis/

For example, to specify emphasis, you use the emphasis element, e.g.

<?xml version="1.0"?>
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
         xml:lang="en-US">
  That is a <emphasis> big </emphasis> car!
  That is a <emphasis level="strong"> huge </emphasis>
  bank account!
</speak>

回复收藏 0 原文

~没有更多了~