一维多峰检测?
我目前正在尝试在 AS3 中实现基本的语音识别。我需要它完全是客户端,因此我无法访问强大的服务器端语音识别工具。我的想法是检测单词中的音节,并用它来确定所说的单词。我知道这会极大地限制识别能力,但我只需要识别几个关键词,就可以确保它们都有不同的音节数。
我目前能够为口语单词生成语音级别的一维数组,并且如果我以某种方式绘制它,我可以清楚地看到在大多数情况下音节都有明显的峰值。然而,我完全不知道如何找出这些峰值。我真的只需要计数,但我想这需要找到它们。起初,我想获取一些最大值并将它们与平均值进行比较,但我忘记了那个峰值比其他峰值大,因此,我所有的“峰值”都位于一个实际峰值上。
我偶然发现一些Matlab代码看起来几乎太短而不是真的,但我不能很因为我无法将其转换为我知道的任何语言。我尝试过 AS3 和 C#。所以我想知道你们是否可以让我走上正确的道路或者有任何用于峰值检测的伪代码?
I am currently trying to implement basic speech recognition in AS3. I need this to be completely client side, as such I can't access powerful server-side speech recognition tools. The idea I had was to detect syllables in a word, and use that to determine the word spoken. I am aware that this will grealty limit the capacities for recognition, but I only need to recognize a few key words and I can make sure they all have a different number of syllables.
I am currently able to generate a 1D array of voice level for a spoken word, and I can clearly see, if I somehow draw it, that there are distinct peaks for the syllables in most of the cases. However, I am completely stuck as to how I would find out those peaks. I only really need the count, but I suppose that comes with finding them. At first I thought of grabbing a few maximum values and comparing them with the average of values but I had forgot about that peak that is bigger than the others and as such, all my "peaks" were located on one actual peak.
I stumbled onto some Matlab code that looks almost too short to be true, but I can't very that as I am unable to convert it to any language I know. I tried AS3 and C#. So I am wondering if you guys could start me on the right path or had any pseudo-code for peak detection?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
matlab 代码非常简单。我会尝试将其翻译成更伪代码的东西。
翻译成ActionScript/C#应该很容易,你应该尝试这个,如果你遇到困难,用你的代码发布后续问题,这样你就会有最好的学习效果。
The matlab code is pretty straightforward. I'll try to translate it to something more pseudocodeish.
It should be easy to translate to ActionScript/C#, you should try this and post follow-up questions with your code if you get stuck, this way you'll have the best learning effect.
寻找曲线的波峰和波谷就是观察线条的斜率。在这样的位置,斜率为 0。由于我猜测语音曲线非常不规则,因此必须首先对其进行平滑,直到仅存在显着的峰值。
所以在我看来,曲线应该被视为一组点。应对点组进行平均以产生简单的平滑曲线。然后比较每个点的差异,找到彼此差异不大的点,并将这些区域识别为峰、谷或高原。
Finding peaks and valleys of a curve is all about looking at the slope of the line. At such a location the slope is 0. As i am guessing a voice curve is very irregular, it must first be smoothed, until only significant peaks exist.
So as i see it the curve should be taken as a set of points. Groups of points should be averaged to produce a simple smooth curve. Then the difference of each point should be compared, and points not very different from each other found and those areas identified as a peak, valleys or plateau.
如果有人想要 AS3 中的最终代码,这里是:
}
If anyone wants the final code in AS3, here it is:
}