可用性：语音识别与键盘

发布于 2024-07-22 06:10:29 字数 472 浏览 21 评论 0原文

我们看到越来越多的语音识别被实现，并且要求图书馆能够进行良好的语音识别。与键盘或小键盘相比，它背后的基本原理（可用性方面）是什么？您有什么理由投资这个开发项目？

以呼叫中心为例。几年前，几乎每个呼叫中心都使用 IVR，提示输入菜单键。现在，我们看到越来越多的菜单提示输入语音关键字和/或按下键盘：“请说发票或按 1 查看您的发票”。或者我们在公司的电话簿中看到同样的内容：“请说出您想要联系的人的名字”...“弗兰克·劳埃德”...“您说的是杰克·弗洛伊德吗？如果您愿意，请说“是”联系此人或拒绝重试”。

我想当你在车里不拿着手机时这是一个优点，但是额外的等待时间值得吗？所有选择的交互时间更长，尝试分析是否说了某些话时的提示时间更长等等？此外，可靠性肯定比以前更好，但有时感觉更像是有人决定插入系统的一个玩具，因此给人一种未来感。

有设计使用（或选择不使用）语音识别的 IVR 或软件的经验吗？

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

子栖 2024-07-29 06:10:29

理由是什么（就
可用性）在它后面与键盘
或键盘？

可用性是一个非常广泛的术语。如果我尝试用触摸板输入我的地址，它不会被认为很有用。一些人认为，使用总体成功率为 70-80% 的语音引擎也不是很有用。正如其他帖子中所指出的，对于使用手机的人来说，免提输入要容易得多。然而，如果主题对呼叫者来说有些陌生，那么使用文字输入与数字输入实际上可能不如按键式电话直观。呼叫者听到不太熟悉的术语和短语在提示的 10-30 秒内无法记住它们，但他们可以用手指将鼠标悬停在听起来最好的选项上或记住选项的顺序。

你有什么理由
投资这个开发？

这是一个奇怪的问题。通常，在 IVR 环境中使用或不使用语音的决定并不是由世界发展观驱动的。除非您有确实需要演讲的特定要求，否则您几乎总是会降低总体成功率。演讲通常是企业形象的一个因素……或者拥有最新的技术玩具。

我想当你在车里不拿着手机时这是一个优点
但额外的等待时间值得吗？

如今，使用现代 ASR 时，语音识别延迟并不是很高。在大多数情况下，输入与语音并行处理，语音识别结束之间的时间为 0.5 到 1 秒。请注意，许多 IVR 在某些输入后需要执行数据查找，这可能会显得系统速度较慢。正常输入超过 1 秒通常是部署动力不足的标志。

最初实施时它可能并没有动力不足，但通过调整工作，您可以做出很多性能与准确性的决策。为了实现下一个 0.1%，资源可以超出其应有的峰值。

此外，可靠性绝对比以前更好，
但有时感觉更像是某人决定的一个玩具
插入系统，让它给人一种未来感。

一般来说，是的。在可靠性方面，您需要真正查看总体数字才能了解系统。这是一场统计之战，个人并不是很重要（除非他们拥有 VP 或以上头衔）。通过优化输入（转换提示）、资源使用和其他语音识别调整参数，您尝试最大限度地提高准确性。对于基本的自然语言响应，您可以达到 90 多岁。但是，您的总体成功率要低得多。想象 5 个提示都为 98%（实际上，您往往有一堆 99，然后是一些 90 年代中期或稍低）： .98 * .98 * .98 * .98 * .98 = 90%。这意味着十分之一的人会失败。那是在调用者混淆和业务规则之前。 DTMF 输入通常非常接近 100%，即使在多次输入之后也是如此。

任何设计 IVR 或软件的经验
使用（或选择不使用）语音识别？
是的。但是，我怀疑这确实不是您想要的问题。作为技术方面的人员，这通常不是你的决定，你对此的影响力有限。如果您确实正在寻找演讲的优点/缺点：

优点：

酷/时尚（注意，仅语音是不够的。您需要出色的 VUI 和语音人才）
适合高度流动的人群避免使用耳机。未来应该将语音与触觉输入相结合。或许。它可能不会来自市场的 IVR 方面。
适用于无法使用 DTMF 完成的任务。请注意，许多此类问题在演讲中的成功率也往往较低。成本（相对于人力）通常是驱动因素，而不是可用性。将呼叫转入语音信箱进行地址更改等操作可能非常经济高效。

缺点：

开发、部署和维护成本高昂。如果您不小心，添加新选择可能会对成功率产生重大影响。始终监控变革的影响。
经常部署不当。例如，只需说出您的数字菜单选项。这几乎就是我们想要言语冷静，但却无力承担实现言语冷静所需的费用的常见情况。
成功率会较低，因此呼叫中心成本会较高。
失败往往集中在特定的提示和个别呼叫者上。经常遇到您的系统问题的呼叫者会对您非常不满意。
当打电话的人不明白他们的意思时，他们会生气。您的目标是确定一部分客户群并真正激怒他们吗？

What's the rationale (in term of
usability) behind it versus a keyboard
or keypad?

Usability is a very broad term. If I were to attempt to enter my address with a touch pad, it wouldn't be considered very usable. Some argue that using a speech engine with an overall success rate of 70-80% isn't very usable either. As indicated in other posts, hands free input can be much easier for those on a mobile phone. However, using words versus numeric input can actually be less intuitive than a touch tone phone if the topic is somewhat foreign to the caller. A caller hearing terms and phrases that aren't very familiar can't remember them in the 10-30 seconds of the prompt but they can hover over the best sounding choice with their finger or remember the order of choices.

What reasons would you have
to invest in this development?

This is an odd question. Usually the decision to use speech or not in an IVR environment is not driven from the development view of the world. Unless you have a specific requirement that really requires speech, you are almost always reducing overall success rates. Speech is usually a factor of corporate image ... or having the latest technological toy.

I guess it's a plus when you're in your car without holding your phone
but is it worth the additional waiting time?

Speech recognition latencies aren't very high these days when using modern ASRs. In most cases, input is handled in parallel with speech and time between end of speech recognition is .5 to 1s. Be aware that many IVRs then need to perform data look-ups after some inputs and this can appear as a slower system. Normal inputs pushing beyond 1s is usually the sign of an under-powered deployment.

It may not have been under-powered when original implemented, but through tuning efforts, you make a lot of performance versus accuracy decisions. To get that next .1%, resources can be pushed beyond what they should be at peak.

Also, reliability is better than it was, definitely,
but sometime it feels more like an toy someone decided
to plugged into the system so it can feel futuristic.

In general, yes. On the reliability note, you need to really look at the overall numbers to get a sense of the system. It is a battle of statistics where the individual isn't very important (unless they hold the title of VP or above). Through optimization of the input (shifting prompting), resource usage and other speech reco tuning parameters you attempt to maximize accuracy. For basic natural language responses, you can get in the upper 90s. However, your overall success rate is much lower. Imagine 5 prompts all at 98% (in reality, you tend to have a bunch 99 and then a few mid 90s or slightly below): .98 * .98 * .98 * .98 * .98 = 90%. That means 1 out of 10 failing. That is before caller confusion and business rules. DTMF input is usually very near 100%, even after several inputs.

Any experience designing IVR or software that
used (or chose not to) speech recognition?
Yes. But, I suspect that really isn't the question you want. As someone on the technology side, this is usually not your decision and you have limited influence on it. If you are really looking for the pros/cons of speech:

Pros:

Cool/hip (note, speech alone isn't sufficient. You need a great VUI and voice talent)
Good for a highly mobile crowd that shuns ear pieces. The future is supposed to be blending speech with tactile input. Maybe. It probably won't come from the IVR side of the market.
Good for tasks that can't be done with DTMF. Note, many of these problems tend to have low success rates in speech as well. Cost (versus humans) is usually the driving factor not usability. Dropping a call into a voicemail box for things like address change can be very cost effective.

Cons:

Expensive to development, deploy and maintain. Adding new choices can have a significant impact on success rates if you aren't careful. Always monitor the impact of change.
Is often deployed inappropriately. For example, just say your numeric menu choice. This is nearly often a case of we want speech coolness, but can't afford what it really takes to achieve speech coolness.
Success rates will be lower and therefore call center costs will be higher.
Failures tend to focus on specific prompts and individual callers. A caller that regularly experiences problems with your system will be very unhappy with you.
Callers get angry when they aren't understood. Is your goal to identify a subset of your customer base and really get them angry ?

回复收藏 0 原文

地狱即天堂 2024-07-29 06:10:29

我认为语音识别与任何输入方法一样都有优点和缺点。

Pro

没有学习曲线，我们从很小的时候就开始说话了。
非常用户直观。
在打电话时，无需不断地将耳机从耳边移开。

缺点

等待时间较长
如果音质不好，需要多次尝试才能做出正确的选择。

回复收藏 0 原文

吹梦到西洲 2024-07-29 06:10:29

在某些情况下，公司需要处理旋转电话。可能会发现仅设置识别系统而不是两者都更具成本效益。

语音识别的开销比按键音大得多。如果您想要获得最佳结果，您需要不断调整应用程序并针对无法识别的单词发音训练系统。您还需要非常注意如何通过语音识别提示用户，否则您可能会得到意想不到的响应。

整体按键音要容易得多，因为在任何给定时间只有一组有限的可能选项。

如果你的应用程序足够简单，你的语音记录只会让它变得复杂。按 2 获取其他语言..

回复收藏 0 原文

扛刀软妹 2024-07-29 06:10:29

语音识别与触摸屏技术相结合无疑是未来的潮流。作为示例，我使用 tazti 语音识别。它有 XP 和 Vista 版本。由于微软的触摸屏“Surface”平台在 Vista 上运行，我确信 tazti 将使用触摸屏技术。当我尝试 tazti 语音识别时，内置命令效果很好。它还让我可以创建自己的语音命令，而且效果也很好。语音搜索 Google 和 Yahoo、Wikipedia Youtube 和许多其他搜索引擎效果很好。还有许多其他功能。但它没有听写功能。我发现我消除了 70% 或更多的互联网产生的点击……也许更多。注意：Tazti 可以从他们的网站免费下载。

回复收藏 0 原文