梅尔频率倒谱系数算法
我想获得一些音频的音色。
为了使用它,我将制作梅尔频率倒谱系数算法。
实现看起来很简单(我已经完成了步骤 1): 1. 对信号(加窗摘录)进行傅立叶变换。 2. 使用三角形重叠窗口将上面获得的光谱的功率映射到梅尔标度上。 3. 记录每个梅尔频率处的功率的对数。 4. 对梅尔对数幂列表进行离散余弦变换,就好像它是一个信号一样。 5. MFCC 是所得频谱的幅度。
在步骤 2 中,我知道如何从频率传递到梅尔标度,但我不知道三角形重叠窗口意味着什么。
我该如何正确执行此步骤? 三角形重叠窗是什么意思?
I want to get the timbre of some audio.
To use that I will make the Mel Frequency Cepstrum Coefficients algorithm.
The implementation looks simples (I allready made step 1):
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum.
In step 2 I know how to pass from frequency to mel scale but I don't know what that triangular overlapping windows means..
How do I do this step correctly?
What does triangular overlapping windows mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
完成梅尔标度的转换后,应用一组沿该标度均匀间隔的重叠三角形滤波器(因此低频的间隔更紧密)。也就是说,在这里,您将从 FFT 返回的大致连续曲线转变为一组离散的 20-50 个离散值。
我在谷歌上搜索了过滤器的图片,找到了一些(均为pdf格式),此处 和此处(第 4 页)。这些还详细描述了他们如何进行计算的其他细节。
Once you've done the conversion to the mel scale, apply a set of overlapping triangular filters spaced evenly along this scale (and therefore more closely spaced for the low frequencies). That is, here you're going from the roughly continuous curve returned by the FFT to a set a discrete 20-50 discrete values.
I googled around for a pictures of the filters, and found a few (both in pdfs), here and here (p. 4). These also describe at some length other details of how they do the calculations.