如何生成MFCC算法的三角窗口以及如何使用它们?
我正在用Java实现MFCC算法。
这里有一个示例代码:http://www.ee。 columbia.edu/~dpwe/muscontent/practical/mfcc.m,位于 Matlab。然而,我对梅尔过滤银行过程有一些问题。如何生成三角形窗口以及如何使用它们?
PS1:一篇文章,其中有一部分描述了MFCC:http://arxiv.org /pdf/1003.4083
PS2:如果有一个基本介绍MFCC算法步骤的文档就好了。
PS3: 我的主要问题与此相关:MFCC 与 Java 线性和对数滤波器 一些实现同时使用线性和对数滤波器,而其中一些则不使用。过滤器是什么以及中心频繁的概念是什么。我遵循该代码:MFCC Java ,该代码之间有什么区别: MFCC Matlab
I am implementing MFCC algorithm in Java.
There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?
PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083
PS2: If there is a document about MFCC algorithms steps basically, it will be good.
PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
作为频带滤波器的三角形窗口并不难实现。您基本上希望在每个频段(定义为中心频率
i-1
和中心频率i+1
之间的频率空间)内集成 FFT 数据。您基本上是在寻找类似的东西,
如果您不理解“中心频率”或“频带”或“滤波器”的概念,请拿起一本基本信号教科书 - 如果没有以下内容,您不应该实现此算法了解它的作用。
至于确切的中心频率是多少,这取决于你。实验并选择(或在出版物中查找)捕获您想要从数据中分离的信息的值。之所以没有明确的值,甚至没有值的比例,是因为该算法试图模拟人耳,而人耳是一种非常复杂的听力设备。一种音阶可能更适合语音,而另一种音阶可能更适合音乐等。您可以选择合适的音阶。
Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency
i-1
and center frequencyi+1
).You're basically looking for something like,
If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.
As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.
第二个PS的答案:我找到了本教程 这确实帮助我计算了 MFCC。
至于三角窗和滤波器组,据我了解,它们确实重叠,它们不扩展到负频率以及从FFT频谱计算它们的整个过程并将它们应用回它是这样的:
这些是您的滤波器组能量,您可以进一步应用对数、应用 DCT 并提取 MFCC...
Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.
As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:
These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...