如何生成MFCC算法的三角窗口以及如何使用它们?

发布于 2024-11-09 07:27:05 字数 1266 浏览 9 评论 0原文

我正在用Java实现MFCC算法。

这里有一个示例代码:http://www.ee。 columbia.edu/~dpwe/muscontent/practical/mfcc.m,位于 Matlab。然而,我对梅尔过滤银行过程有一些问题。如何生成三角形窗口以及如何使用它们?

PS1:一篇文章,其中有一部分描述了MFCC:http://arxiv.org /pdf/1003.4083

PS2:如果有一个基本介绍MFCC算法步骤的文档就好了。

PS3: 我的主要问题与此相关:MFCC 与 Java 线性和对数滤波器 一些实现同时使用线性和对数滤波器,而其中一些则不使用。过滤器是什么以及中心频繁的概念是什么。我遵循该代码:MFCC Java ,该代码之间有什么区别: MFCC Matlab

I am implementing MFCC algorithm in Java.

There is a sample code here: http://www.ee.columbia.edu/~dpwe/muscontent/practical/mfcc.m at Matlab. However I have some problems with mel filter banking process. How to generate triangular windows and how to use them?

PS1: An article which has a part that describes MFCC: http://arxiv.org/pdf/1003.4083

PS2: If there is a document about MFCC algorithms steps basically, it will be good.

PS3: My main question is related to that: MFCC with Java Linear and Logarithmic Filters some implementations use both linear and logarithmic filter and some of them not. What is that filters and what is the center frequent concept. I follow that code:MFCC Java , what is the difference of it between that code: MFCC Matlab

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时光清浅 2024-11-16 07:27:05

作为频带滤波器的三角形窗口并不难实现。您基本上希望在每个频段(定义为中心频率 i-1 和中心频率 i+1 之间的频率空间)内集成 FFT 数据。

您基本上是在寻找类似的东西,

for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
    int startFreqIdx  = centerFreqs[bandIdx-1];
    int centerFreqIdx = centerFreqs[bandIdx];
    int stopFreqIdx   = centerFreqs[bandIdx+1];

    for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-startFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
    }

    for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-stopFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
    }
}

如果您不理解“中心频率”或“频带”或“滤波器”的概念,请拿起一本基本信号教科书 - 如果没有以下内容,您不应该实现此算法了解它的作用。

至于确切的中心频率是多少,这取决于你。实验并选择(或在出版物中查找)捕获您想要从数据中分离的信息的值。之所以没有明确的值,甚至没有值的比例,是因为该算法试图模拟人耳,而人耳是一种非常复杂的听力设备。一种音阶可能更适合语音,而另一种音阶可能更适合音乐等。您可以选择合适的音阶。

Triangular windows as frequency band filters aren't hard to implement. You basically want to integrate the FFT data within each band (defined as the frequency space between center frequency i-1 and center frequency i+1).

You're basically looking for something like,

for(int bandIdx = 0; bandIdx < numBands; bandIdx++) {
    int startFreqIdx  = centerFreqs[bandIdx-1];
    int centerFreqIdx = centerFreqs[bandIdx];
    int stopFreqIdx   = centerFreqs[bandIdx+1];

    for(int freq = startFreqIdx; i < centerFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-startFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-startFreqIdx)/magnitudeScale;
    }

    for(int freq = centerFreqIdx; i <= stopFreqIdx; i++) {
        magnitudeScale = centerFreqIdx-stopFreqIdx;
        bandData[bandIdx] += fftData[freq]*(i-stopFreqIdx)/magnitudeScale;
    }
}

If you do not understand the concept of a "center frequency" or a "band" or a "filter," pick up an elementary signals textbook--you shouldn't be implementing this algorithm without understanding what it does.

As for what the exact center frequencies are, it's up to you. Experiment and pick (or find in publications) values that capture the information you want to isolate from the data. The reason that there are no definitive values, or even scale for values, is because this algorithm tries to approximate a human ear, which is a very complicated listening device. Whereas one scale may work better for, say, speech, another may work better for music, etc. It's up to you to choose what is appropriate.

睫毛上残留的泪 2024-11-16 07:27:05

第二个PS的答案:我找到了本教程 这确实帮助我计算了 MFCC。

至于三角窗和滤波器组,据我了解,它们确实重叠,它们扩展到负频率以及从FFT频谱计算它们的整个过程并将它们应用回它是这样的:

  1. 选择滤波器的最小和最大频率(例如,最小频率 = 300Hz - 最小语音频率和最大频率 = 你的采样率/ 2。也许这就是你应该的地方选择您正在讨论的 1000Hz 限制)
  2. 根据所选的最小和最大频率计算梅尔值。 此处的公式
  3. 计算这两个梅尔值之间的 N 个等距值。 (我已经看到了不同 N 值的示例,您甚至可以在 this 中找到不同值的效率比较工作,对于我的测试,我选择了26)
  4. 将这些值转换回赫兹。 (您可以在同一维基页面上找到公式)=> N + 2 个过滤器值的数组
  5. 为每三个连续值计算一个过滤器组(过滤器三角形),无论是上面 Thomas 的建议(小心索引)还是像本文开头推荐的图例中那样)=>数组的数组,大小为 NxM,假设您的 FFT 返回 2*M 值,并且您只使用 M。
  6. 将整个功率谱(从 FFT 获得的 M 值)通过每个三角滤波器,以获得每个滤波器的“滤波器组能量”(对于每个滤波器组(N循环),将FFT后得到的每个幅度乘以对应滤波器组(M循环)中的每个值,并将得到的M个值相加)=> N 大小的能量阵列。

这些是您的滤波器组能量,您可以进一步应用对数、应用 DCT 并提取 MFCC...

Answer for the second PS: I found this tutorial that really helped me computing the MFCCs.

As for the triangular windows and the filterbanks, from what I understood, they do overlap, they do not extend to negative frequences and the whole process of computing them from the FFT spectrum and applying them back to it goes something like this:

  1. Choose a minimum and a maximum frequency for the filters (for example, min freq = 300Hz - the minimum voice frequency and max frequency = your sample rate / 2. Maybe this is where you should choose the 1000Hz limit you were talking about)
  2. Compute the mel values from the min and max chosen frequences. Formula here.
  3. Compute N equally distanced values between these two mel values. (I've seen examples of different values for N, you can even find a efficiency comparison for different of values in this work, for my tests I've picked 26)
  4. Convert these values back to Hz. (you can find the formula on the same wiki page) => array of N + 2 filter values
  5. Compute a filterbank (filter triangle) for each three consecutive values, either how Thomas suggested above (being careful with the indexes) or like in the turorial recommended at the beginning of this post) => an array of arrays, size NxM, asuming your FFT returned 2*M values and you only use M.
  6. Pass the whole power spectrum (M values obtained from FFT) through each triangular filter to get a "filterbank energy" for each filter (for each filterbank (N loop), multiply each magnitude obtained after FFT to each value in the corresponding filterbank (M loop) and add the M obtained values) => N-sized array of energies.

These are your filterbank energies that you can further apply a log to, apply the DCT and extract the MFCCs...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文