如何从信号的 FFT 中获取 MFCC?
简短: 从 FFT 获取 MFCC 涉及哪些步骤?
详细:
我正在开发一个鼓应用程序来对声音进行分类。它是 iPhone 的一个匹配应用程序,具有用于声音处理的 openframeworks 库,其想法是返回您在响亮的印度鼓(称为 Dhol)上演奏的音符的名称 - 只有少数音符可以演奏。
我已经实现了FFT算法并成功获得了频谱。我现在想更进一步,从 fft 返回 mfcc。
这是我目前所理解的。 它基于非线性梅尔频率范围内对数功率谱的线性余弦变换。
它使用三角测量来滤除频率并获得所需的系数。 http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png
因此,如果从 fft 算法返回大约 1000 个值 -声音的频谱,那么您将获得大约 12 个元素(即系数)。这个 12 元素向量用于对乐器进行分类,包括演奏的鼓......
这就是我想要实现的目标。
有人可以帮助我如何做这样的事情吗? 任何帮助将不胜感激。干杯
SHORT AND SIMPLE:
What are the steps that are involved to get an MFCC from an FFT.
DETAILED:
I'm working on a drum application to classify sounds. Its a matching application for the iPhone with the openframeworks library for sound processing, the idea is to return the name of the note that you play on the loud Indian drum (known as the Dhol) - only a few notes are playable.
I've implemented the FFT algorithm and successfully obtain a spectrum. I now want to take it one step further and return the mfcc from the fft.
This is what I understand so far.
Its based on linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.
It uses triangulation to filter out the frequencies and get a desired coefficient.
http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png
So if you have around 1000 values returned from the fft algorithm - the spectrum of the sound, then desirably you'll get around 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including the drum played...
This is all I'm trying to achieve.
Could someone please help me on how to do something like this?
Any help would be greatly appreciated. Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,您必须将信号分割为 10 到 30 毫秒的小帧,应用窗口函数(建议在声音应用中使用嗡嗡声),并计算信号的傅里叶变换。使用 DFT,要计算梅尔频率倒谱系数,您必须遵循以下步骤:
Python 代码示例
:代码基于 MFCC鞋面示例。我希望这对你有帮助!
First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:
A python code example:
This code is based on MFCC Vamp example. I hope this help you!