如何从信号的 FFT 中获取 MFCC？

发布于 2024-11-04 15:46:50 字数 739 浏览 7 评论 0原文

简短： 从 FFT 获取 MFCC 涉及哪些步骤？

详细：

我正在开发一个鼓应用程序来对声音进行分类。它是 iPhone 的一个匹配应用程序，具有用于声音处理的 openframeworks 库，其想法是返回您在响亮的印度鼓（称为 Dhol）上演奏的音符的名称 - 只有少数音符可以演奏。

我已经实现了FFT算法并成功获得了频谱。我现在想更进一步，从 fft 返回 mfcc。

这是我目前所理解的。它基于非线性梅尔频率范围内对数功率谱的线性余弦变换。

它使用三角测量来滤除频率并获得所需的系数。 http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

因此，如果从 fft 算法返回大约 1000 个值 -声音的频谱，那么您将获得大约 12 个元素（即系数）。这个 12 元素向量用于对乐器进行分类，包括演奏的鼓......

这就是我想要实现的目标。

有人可以帮助我如何做这样的事情吗？任何帮助将不胜感激。干杯

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

書生途 2024-11-11 15:46:50

首先，您必须将信号分割为 10 到 30 毫秒的小帧，应用窗口函数（建议在声音应用中使用嗡嗡声），并计算信号的傅里叶变换。使用 DFT，要计算梅尔频率倒谱系数，您必须遵循以下步骤：

获取功率谱： |DFT|^2
计算三角组滤波器以将 hz 尺度转换为梅尔尺度
获取对数谱
应用离散余弦变换

Python 代码示例

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

：代码基于 MFCC鞋面示例。我希望这对你有帮助！

First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:

Get power spectrum: |DFT|^2
Compute a triangular bank filter to transform hz scale into mel scale
Get log spectrum
Apply discrete cossine transform

A python code example:

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

This code is based on MFCC Vamp example. I hope this help you!

回复收藏 0 原文

~没有更多了~