当前位置：文江博客话题详情

在 Android 上使用 TTS：大声朗读标点符号

发布于 2024-12-22 05:09:22 字数 541 浏览 1 评论 0原文

上下文：我的应用程序正在将句子发送到用户拥有的任何 TTS 引擎。句子是用户生成的，可能包含标点符号。

问题：一些用户报告标点符号在 SVOX、Loquendo 和其他可能的系统上大声朗读（TTS 表示“逗号”等）。

问题：

我应该删除所有标点符号吗？
我应该使用这种API来转换标点符号吗？
我应该让 TTS 引擎处理标点符号吗？

在 Loquendo 中发现问题的同一用户，在另一个名为 FBReader 的 Android 应用程序中则没有此问题。所以我认为第三种选择不是正确的选择。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怪我入戏太深 2024-12-29 05:09:22

我的一个应用程序也遇到了同样的问题。

输入字符串为：

10 分钟后的下一个闹钟，将是下午 2:45

，TTS 引擎会说：

10 分钟后的下一个闹钟逗号，它将是下午 2:45代码>.

只需在逗号后添加一个空格即可解决问题，如下所示：

10 分钟后下一个闹钟，将是下午 2:45

这是一个愚蠢的错误，也许您的问题比这更复杂，但这对我有用。 :)

回复收藏 0 原文

何时共饮酒 2024-12-29 05:09:22

因此，您担心用户可能碰巧选择了哪种后巷获得的文本转语音引擎作为默认引擎...大概是因为您不希望您的应用程序由于该引擎的未知/而看起来很糟糕不良行为。可以理解。

但（好的）事实是，TTS 的行为实际上不是您的责任，除非您决定在应用程序本身中嵌入引擎（难度：难，推荐？否）。

引擎可以而且应该被认为遵守此处...并且假定在 Android 系统设置（home\settings\language&locale\TTS）中提供了自己足够的配置选项集，其中可能包含也可能不包含发音选项。还应该假定用户足够聪明来安装他们满意的引擎。

承担预测和“纠正”未知和不需要的引擎行为的工作（至少在您没有亲自测试过的引擎中）是一个滑坡。

一个简单而好的选择（难度：简单）：

在您的应用程序中进行设置：“忽略标点符号”。

更好的选择（难度：中等）：

执行上述操作，但如果您在用户设备上检测到的引擎容易出现此问题，则仅显示“忽略标点符号”设置选项。

另外，需要注意的一件事是，引擎之间存在很多很多差异（是否使用嵌入式语音与在线语音、响应时间、初始化时间、可靠性/遵守 Android 规范、跨 Android API 级别的行为、跨自己版本历史记录的行为，声音的质量，更不用说语言能力了）...对于用户来说，差异可能比标点符号是否发音更重要。

您说“我的应用程序正在向用户拥有的任何 TTS 引擎发送句子。”嗯……“这就是你的问题。”为什么不让用户选择使用什么引擎呢？

并引导我们......

一个更好的选择（难度：困难和好！[以我的拙见]）：

决定您的应用程序将“支持”的一些“已知良好”引擎，从 Google 和 Samsung 开始。我猜现在只有不到 5% 的设备没有配备这两种引擎。
在您计划支持的所有 Android API 级别上尽可能多地研究和测试这些引擎......至少在它们是否发音标点符号方面。
随着时间的推移，如果您愿意，可以测试更多引擎，并在后续应用程序更新中将它们添加到支持的引擎中。
当您的应用程序启动时运行一个算法来检测安装了哪些引擎，然后根据您自己的受支持引擎列表使用该信息：

private ArrayList<String> whatEnginesAreInstalled(Context context) {
    final Intent ttsIntent = new Intent();
    ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
    final PackageManager pm = context.getPackageManager();
    final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
    ArrayList<String> installedEngineNames = new ArrayList<>();
    for (ResolveInfo r : list) {
        String engineName = r.activityInfo.applicationInfo.packageName;
        installedEngineNames.add(engineName);

        // just logging the version number out of interest
        String version = "null";
        try {
            version = pm.getPackageInfo(engineName,
            PackageManager.GET_META_DATA).versionName;
            } catch (Exception e) {
                Log.i("XXX", "try catch error");
            }
        Log.i("XXX", "we found an engine: " + engineName);
        Log.i("XXX", "version: " + version);
    }
    return installedEngineNames;
}

在应用程序的设置中，将您决定支持的所有引擎显示为选项（即使当前未安装）。这可以是一组简单的单选按钮，其标题对应于不同的引擎名称。如果用户选择未安装的一个，请通知他们，并让他们选择有目的地安装它。
将用户选择的引擎名称（字符串）保存在 SharedPreferences 中，并在应用程序中需要 TTS 时将其选择用作 TextToSpeech 构造函数的最后一个参数。
如果用户安装了一些奇怪的引擎，也将其作为一个选择，即使它无法识别/不受支持，但通知他们他们选择了未知/未经测试的引擎。
如果用户选择受支持但已知会发音标点符号（不良）的引擎，则在选择该引擎后，会弹出一个警报对话框，警告用户这一点，并解释说他们可以使用“关闭此不良行为”忽略标点符号”设置已经提到。

旁注：

不要让 SVOX/PICO（模拟器）引擎让您太担心 - 它有很多缺陷，甚至没有设计或保证在 API ~20 以上的 Android 上运行，但仍然包含在模拟器图像高达 API ~24，导致“不可预测的结果”，实际上并不反映现实。我还没有在过去七年左右的时间里在任何真正的硬件设备上看到过这个引擎。
既然你说“句子是用户生成的”，我会更担心解决他们将用什么语言输入的问题！我会注意这个问题！ :)

So, you're worried about what back-alley-acquired text-to-speech engine the user might happen to have selected as their default... presumably because you don't want your app to look bad due to this engine's unknown/bad behavior. Understandable.

The (good) fact is, though, that the TTS's behavior is not actually your responsibility unless you decide to embed an engine in the app itself (Difficulty: Hard, Recommended? No).

Engines can and should be presumed to adhere to Android rules and behaviors dictated here... and presumed to supply their own sufficient set of configuration options in the Android system settings (home\settings\language&locale\TTS) which may or may not include pronunciation options. The user should also be presumed intelligent enough to install an engine that they are satisfied with.

It is a slippery slope to take on the job of anticipating and "correcting" for unknown and unwanted engine behaviors (at least in engines that you haven't tested yourself).

A SIMPLE AND GOOD OPTION (Difficulty: Easy):

Make a setting in your app: "ignore punctuation."

A BETTER OPTION (Difficulty: Medium):

Do the above, but only show the "ignore punctuation" setting-option if the engine you have detected on the user's device is prone to this issue.

Also, one thing to note is that there are many, many differences between engines (whether they use embedded voices vs online, response time, initialization time, reliability/adherence to Android specs, behavior across Android API levels, behavior across their own version history, the quality of voices, not to mention language capability)... differences that may be even more important to users than whether or not punctuation is pronounced.

You say "My application is sending sentences to whatever TTS engine the user has." Well... "That's yer problem right there." Why not give the user a choice on what engine to use?

And leads us to...

AN EVEN BETTER OPTION (Difficulty: Hard and Good! [in my humble opinion]):

Decide on some "known-good" engines your app will "support," starting with Google and Samsung. I would guess that there are less than 5% of devices out there these days that don't have either of those engines on them.
Study and test these engines as much as possible across all Android API levels that you plan to support... at least in as far as whether they pronounce punctuation or not.
Over time, test more engines if you like, and add them to your supported engines in subsequent app updates.
Run an algorithm when your app starts that detects which engines are installed, then use that info against your own list of supported engines:

private ArrayList<String> whatEnginesAreInstalled(Context context) {
    final Intent ttsIntent = new Intent();
    ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
    final PackageManager pm = context.getPackageManager();
    final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
    ArrayList<String> installedEngineNames = new ArrayList<>();
    for (ResolveInfo r : list) {
        String engineName = r.activityInfo.applicationInfo.packageName;
        installedEngineNames.add(engineName);

        // just logging the version number out of interest
        String version = "null";
        try {
            version = pm.getPackageInfo(engineName,
            PackageManager.GET_META_DATA).versionName;
            } catch (Exception e) {
                Log.i("XXX", "try catch error");
            }
        Log.i("XXX", "we found an engine: " + engineName);
        Log.i("XXX", "version: " + version);
    }
    return installedEngineNames;
}

In your app's settings, present all engines that you've decided to support as options (even if not currently installed). This could be a simple group of RadioButtons with titles corresponding to the different engine names. If the user selects one that isn't installed, notify them of that and give them the option of installing it with an intent.
Save the user's selected engine name (String) in SharedPreferences, and use their selection as the last argument of the TextToSpeech constructor any time you need a TTS in your app.
If the user has some weird engine installed, present it as a choice also, even if it is unrecognized/unsupported, but inform them that they have selected an unknown/untested engine.
If the user selects an engine that is supported but is known to pronounce punctuation (bad), then upon selection of that engine, have an alert dialog pop up warning the user about that, explaining that they can turn this bad behavior off with the "ignore punctuation" setting referred to already.

SIDE-NOTES:

Don't let the SVOX/PICO (emulator) engine get you too worried -- it has many flaws and is not even designed or guaranteed to run on Android above API ~20, but is still included on emulators images up to API ~24, resulting in "unpredictable results" that don't actually reflect reality. I have yet to see this engine on any real hardware device made within the last seven years or so.
Since you say that "sentences are user generated," I would be more worried about solving the problem of what language they are going to be typing in! I'll look out for a question on that! :)

回复收藏 0 原文

~没有更多了~