Pocketsphinx 本身最近也实现了开源有效关键字识别,但尚未进入 Openers。它只能通过 pocketsphinx API 使用,您需要创建 kws 搜索并设置要查找的目标单词。我希望 OpenEars 也能很快实现这一功能。
If you want to track just few keywords, you should not look for speech recognition API or service. This task is called Keyword Spotting and it uses different algorithms than speech recognition. Speech recognition tries to find all the words that has been said and because of that it consumes way more resources than keyword spotting. Keyword spotter only tries to find few selected keywords or keyphrases. It's way simple and way less resource consuming.
The only possible solution to archive this funcitonality is to use open source package like OpenEars powered by Pocketsphinx
Openears has Rejecto plugin that implements something similar.
Pocketsphinx itself has recently implemented open source effective keyword spotting too, but it didn't get into Openers yet. It's only available through pocketsphinx API, you need to create kws search and set the target word to look for. I hope soon this functionality will reach OpenEars too.
Nuance services are typically offered commercially and require up front fees and transaction fees. The interesting news above is that they now make low volume use of their services available to developers for free. So, for development, testing, and demonstration you can probably use the free Nuance services. However, unlike the Google services that come free in Android, if your app has thousands of users you will likely have to pay for Nuance services.
We have been developing CeedVocal SDK since 2008, it's based on Julius & FLite open source projects.
Here's some context: we wanted to make our app (Vocalia) for speech recognition back in 2008 and basically picked Julius (hesitated with Pocket Sphinx, which appears to be good as well) and optimized its file format so that it would boot in 1-2 sec instead of 20sec on the original iPhone. Then we dutifully trained our own acoustic models in 6 languages. We designed the API, and eventually decided to offer it to other developers as an SDK.
CeedVocal basically supports 2 modes of operation:
matching of words (or small phrases)
keyword spotting
In the first mode of operation, it tries to align the input speech to a word (or phrase) in its list of acceptable input. This forces the input to a pre-known word, even if the speech is something else. Accuracy is good. In the second mode of operation, it will try to pick one of its keywords into the stream of speech. This is a difficult case, and it can be less accurate.
发布评论
评论(3)
如果您只想跟踪几个关键字,则不应寻找语音识别 API 或服务。此任务称为关键字识别,它使用与语音识别不同的算法。语音识别试图找到所有说过的单词,因此它比关键字识别消耗更多的资源。关键字发现器仅尝试查找少数选定的关键字或关键短语。它非常简单,而且消耗的资源更少。
归档此功能的唯一可能的解决方案是使用开源包,例如由 Pocketsphinx
http://www.politepix 提供支持的 OpenEars。 com/openears
Openears 有 Rejecto 插件,可以实现类似的功能。
Pocketsphinx 本身最近也实现了开源有效关键字识别,但尚未进入 Openers。它只能通过 pocketsphinx API 使用,您需要创建 kws 搜索并设置要查找的目标单词。我希望 OpenEars 也能很快实现这一功能。
If you want to track just few keywords, you should not look for speech recognition API or service. This task is called Keyword Spotting and it uses different algorithms than speech recognition. Speech recognition tries to find all the words that has been said and because of that it consumes way more resources than keyword spotting. Keyword spotter only tries to find few selected keywords or keyphrases. It's way simple and way less resource consuming.
The only possible solution to archive this funcitonality is to use open source package like OpenEars powered by Pocketsphinx
http://www.politepix.com/openears
Openears has Rejecto plugin that implements something similar.
Pocketsphinx itself has recently implemented open source effective keyword spotting too, but it didn't get into Openers yet. It's only available through pocketsphinx API, you need to create kws search and set the target word to look for. I hope soon this functionality will reach OpenEars too.
Nuance 为开发人员提供免费访问(但不适用于高容量) - 请参阅 http://www.masshightech.com/stories/2011/09/26/daily13-Nuance-tweaks-mobile-dev-program-with-free-access-to-Dragon.html 或 http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
Nuance 服务通常以商业方式提供,需要预付费用和交易费。上面有趣的消息是,他们现在向开发人员免费提供少量服务。因此,对于开发、测试和演示,您可以使用免费的 Nuance 服务。然而,与 Android 中免费的 Google 服务不同,如果您的应用拥有数千名用户,您可能需要为 Nuance 服务付费。
Nuance gives developers free access (but not for high volume) - See http://www.masshightech.com/stories/2011/09/26/daily13-Nuance-tweaks-mobile-dev-program-with-free-access-to-Dragon.html or http://dragonmobile.nuancemobiledeveloper.com/public/index.php?task=home
Nuance services are typically offered commercially and require up front fees and transaction fees. The interesting news above is that they now make low volume use of their services available to developers for free. So, for development, testing, and demonstration you can probably use the free Nuance services. However, unlike the Google services that come free in Android, if your app has thousands of users you will likely have to pay for Nuance services.
我们自 2008 年以来一直在开发 CeedVocal SDK,它基于 Julius & FLite 开源项目。
以下是一些背景信息:早在 2008 年,我们就想制作用于语音识别的应用程序(Vocalia),基本上选择了 Julius(犹豫是否选择 Pocket Sphinx,这似乎也不错)并优化了其文件格式,以便它能够以 1- 2 秒,而不是原来 iPhone 上的 20 秒。然后,我们尽职尽责地用 6 种语言训练了我们自己的声学模型。我们设计了 API,并最终决定将其作为 SDK 提供给其他开发人员。
CeedVocal 基本上支持 2 种操作模式:
在第一种操作模式中,它尝试将输入语音与其可接受输入列表中的单词(或短语)对齐。这会强制输入预先知道的单词,即使语音是其他单词。准确度很好。在第二种操作模式中,它将尝试将其关键字之一选择到语音流中。这是一个困难的案例,而且可能不太准确。
We have been developing CeedVocal SDK since 2008, it's based on Julius & FLite open source projects.
Here's some context: we wanted to make our app (Vocalia) for speech recognition back in 2008 and basically picked Julius (hesitated with Pocket Sphinx, which appears to be good as well) and optimized its file format so that it would boot in 1-2 sec instead of 20sec on the original iPhone. Then we dutifully trained our own acoustic models in 6 languages. We designed the API, and eventually decided to offer it to other developers as an SDK.
CeedVocal basically supports 2 modes of operation:
In the first mode of operation, it tries to align the input speech to a word (or phrase) in its list of acceptable input. This forces the input to a pre-known word, even if the speech is something else. Accuracy is good. In the second mode of operation, it will try to pick one of its keywords into the stream of speech. This is a difficult case, and it can be less accurate.