当前位置：文江博客话题详情

Voicexml语法有多少个单词

发布于 2024-10-16 19:26:29 字数 192 浏览 2 评论 0原文

我想在我的 voicexml 文件中有一个动态语法（读取单个产品并使用 php 创建语法），

我的问题是，如果有任何建议或经验，应该将多少单词写入我阅读产品的源代码中。我对这些单词的结构或发音不太了解，所以我们可以说

a) 这些单词彼此之间有很大不同 b）单词具有相同的结构或发音 c) 混合了 a) 和 b)

提前致谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

看海 2024-10-23 19:26:29

当您指示 VoiceXML 的动态语法时，我假设您指的是 SRGS 语法。

不幸的是，您必须在合理的负载下进行性能测试才能真正确定。我已经在一定条件下成功传输了 1M+ 语法。我还列了 10,000 个名单。我还遇到过只能使用几十个条目的平台。

语音识别 (ASR) 和 VoiceXML 平台将对您的结果产生重大影响。而且，使用此语法的并发识别数量也将与整体识别负载相关。

您提到的因素确实会对识别性能和 CPU 负载产生影响，但我通常发现语法的大小和条目的长度/可变性更重要。例如，是/否语法通常比复杂的菜单语法具有更高的 CPU 负载（短语往往需要更多遍，并且在处理时留下更多可能性）。我从广泛的数字语法（9-31 数字语法）中看到了一些可怕的数字。声音很短且难以消除歧义。组件的可变性再次产生了大量的路径，必须不断检查这些路径以获得解决方案。大多数菜单或自然口语短语都有较长的单词，这些单词听起来明显不同，因此可以快速排除许多路径。

一些提示：

大多数企业级 ASR 系统都支持缓存。如果您可以使用 URL 参数识别语法并设置 ASR 所需的任何 HTTP 标头信息（不要假设它们遵循标准），您可能会看到性能的显着提升。

提示通常可以隐藏语法加载/编译阶段。如果您有一个相对较长的提示，人们往往会闯入，您会发现您可以隐藏一些相当大的语法提取。同样，并非所有平台都能很好地并行处理这些任务。请注意，大多数 ASR 引擎可以收集音频并执行端点定位，同时仍然获取和编译语法。这可以为您赢得更多时间，但您会看到延迟更长的影响。

大多数 ASR 引擎提供的工具可让您使用示例音频分析语法。这些工具通常会给你一个cpu资源指标。由于识别并发性的复杂性，我很少发现您可以计算/预测整体性能，但它们可以为您提供与其他语法的比较影响。我还没有找到一个可以轻松跟踪语法处理时间的引擎，甚至很难粗略地猜测并发挑战。在大多数情况下，大规模测试是必要的。

在语法加载/编译时间之后，识别并发性是最重要的性能影响。我见过一些应用程序在调用开始时具有高度复杂的语法。存在高水平的识别并发性，而没有机会进行缓存（当时的平台问题），这导致了扩展挑战（识别处理中的间歇性、大延迟）。

I'm assuming you mean SRGS grammars when you indicate a dynamic grammar for VoiceXML.

Unfortunately, you're going to have to do performance testing under a reasonable load to really know for sure. I've successfully transmitted 1M+ grammars under certain conditions. I've also done 10,000 name lists. I've also come across platforms that can only utilize a few dozen entries.

The speech recognition (ASR) and VoiceXML platform are going to have a significant impact on your results. And, the number of concurrent recognitions with this grammar will also be relevant along with the overall recognition load.

The factors you mention do have an impact on recognition performance and cpu load, but I've typically found size of grammar and length/variability of entries to matter more. For example, yes/no grammars typically have a much higher cpu load then complex menu grammars (short phrases tend to require more passes and leave open a larger number of possibilities when processing). I've seen some horrible numbers from wide ranging digit grammars (9-31 digit grammars). The sounds are short and difficult to disambiguate. The variability in components, again, creates large number of paths that have to be continuously checked for a solution. Most menu or natural speaking phrases have longer words that sound significantly different so that many paths can be quickly excluded.

Some tips:

Most enterprise class ASR systems support a cache. If you can identify grammars with URL parameters and set any HTTP header information the ASR needs (don't assume they follow the standards), you may see a significant performance boost.

Prompts can often hide grammar loading/compiling phases. If you have a relatively long prompt where people will tend to barge in, you'll find that you can hide some fairly large grammar fetches. Again, not all platforms do a good job of processing these tasks in parallel. Note, most ASR engines can collect audio and perform end-pointing, while still fetching and compiling the grammar. This buys you more time, but you'll see the impact in longer latencies.

Most ASR engines provide tools that let you analyze a grammar with sample audio. The tools will usually give you a cpu resource indicators. I've rarely found that you can calculate/predict overall performance due to the complexities around recognition concurrency, but they can give you a comparative impact with other grammars. I have yet to find an engine that makes it easy to track grammar processing times, it can be difficult to even roughly guess concurrency challenges. In most cases, large scale testing has been necessary.

After grammar load/compile times, recognition concurrency is the most significant performance impact. I've seen a few applications that have highly complex grammars near the beginning of the call. There were high levels of recognition concurrency without an opportunity to cache (platform issue at the time), which lead to scaling challenges (intermittent, large latencies in recognition processing).

回复收藏 0 原文

~没有更多了~