语音识别:检测日语假名(辅音和元音)
我想找到一些开源代码(尽管我会选择闭源产品)来转换传入的日语假名音频流(即辅音+元音对)并几乎实时打印出来。
但是,我想将这些基本声音单位用于我自己的自定义目的,因此我不需要任何试图提取真正日语单词的高级处理。我只想得到原始假名。
有人知道这样的技术吗?
我今天刚刚了解到日语“字母表”基本上是 假名 的 10x5 网格。 10 列(空 + 9 个辅音)和 5 行(元音)
,每个元素称为“假名”,语言由这些假名的序列组成;这些是基本构建块。
这肯定会对语音识别算法产生很大的影响。
对于西方语言,我所知道的所有商业语音识别引擎均源自 CMUSphinx,它在三元模型上运行:它用唯一的 MFCC 向量表示三个音素之间的每个运动,并计算出一个话语最可能的三元序列(从中它可以简单地推导出音素,然后遍历其字典)单词三元组,找出最可能的句子)。
但对于像日语这样的语言,我猜这可能不再是最有效的算法。
相反,尝试捕获每个假名或假名对可能是有意义的。
...这将是 2 克或 4 克。但不是3!
外面有什么吗?或者他们只是使用与西方世界相同的发动机?
I would like to find some open source code (although I would settle for a closed source product) to convert an incoming audio stream of Japanese Kana (ie consonant+vowel pairs) and print them out pretty much in real-time.
However, I want to use these basic sound units for my own custom purpose, so I don't want any high-level processing that tries to extract genuine Japanese words. I just want to get the raw Kana.
Is anyone aware of such a technology?
I just learned today that the Japanese ' alphabet ' is basically a 10x5 grid of Kana. 10 columns ( empty + 9 consonants ) and 5 rows ( vowels )
and each element is called a 'Kana', and the language consists of sequences of these Kana; these are the basic building blocks.
This must surely have a large impact on speech recognition algorithms.
For Western languages, all commercial speech recognition engines I am aware of derive from CMUSphinx which operates on a tri-gram model: it represents each movement between three phonemes with a unique MFCC vector and figures out the most likely tri-gram sequence(s) for an utterance (from which it can deduce trivially the phonemes, and then run through its dictionary of WORD-triplets, to figure out the most likely sentence).
But for a language such as Japanese, I would guess that this may no longer be the most efficient algorithm.
Instead, it may make sense to try and catch each individual Kana, or Kana-pair.
...which is going to be 2-gram or 4-gram. but not 3!
Is there anything out there? Or do they just use the same engines the Western world does?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Julius 有日语的声学和语言模型。
尝试一下,看看它是否适合您的应用程序。
我不知道他们训练了语言模型,但 Julius 可以支持任何顺序的 n-gram
反向传球。在forward方面,是支持bigram。反过来使用 4-gram 是很常见的
经过。两个 LM 均使用 Julius 工具组合在一起。
路易斯
ASR实验室
Julius has acoustic and language models for Japanese.
Give a try and see if it is good for your application.
I don't know they trained language models, but Julius can support any order n-gram in the
reverse pass. In forward, it is support bigram. It is common to use 4-gram in the reverse
pass. Both LM are put together using a Julius tool.
Luis
ASR Labs