如何从信号的 FFT 中获取 MFCC?

发布于 2024-11-04 15:46:50 字数 739 浏览 1 评论 0原文

简短: 从 FFT 获取 MFCC 涉及哪些步骤?

详细:

我正在开发一个鼓应用程序来对声音进行分类。它是 iPhone 的一个匹配应用程序,具有用于声音处理的 openframeworks 库,其想法是返回您在响亮的印度鼓(称为 Dhol)上演奏的音符的名称 - 只有少数音符可以演奏。

我已经实现了FFT算法并成功获得了频谱。我现在想更进一步,从 fft 返回 mfcc。

这是我目前所理解的。 它基于非线性梅尔频率范围内对数功率谱的线性余弦变换。

它使用三角测量来滤除频率并获得所需的系数。 http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

因此,如果从 fft 算法返回大约 1000 个值 -声音的频谱,那么您将获得大约 12 个元素(即系数)。这个 12 元素向量用于对乐器进行分类,包括演奏的鼓......

这就是我想要实现的目标。

有人可以帮助我如何做这样的事情吗? 任何帮助将不胜感激。干杯

SHORT AND SIMPLE:
What are the steps that are involved to get an MFCC from an FFT.

DETAILED:

I'm working on a drum application to classify sounds. Its a matching application for the iPhone with the openframeworks library for sound processing, the idea is to return the name of the note that you play on the loud Indian drum (known as the Dhol) - only a few notes are playable.

I've implemented the FFT algorithm and successfully obtain a spectrum. I now want to take it one step further and return the mfcc from the fft.

This is what I understand so far.
Its based on linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

It uses triangulation to filter out the frequencies and get a desired coefficient.
http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

So if you have around 1000 values returned from the fft algorithm - the spectrum of the sound, then desirably you'll get around 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including the drum played...

This is all I'm trying to achieve.

Could someone please help me on how to do something like this?
Any help would be greatly appreciated. Cheers

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

書生途 2024-11-11 15:46:50

首先,您必须将信号分割为 10 到 30 毫秒的小帧,应用窗口函数(建议在声音应用中使用嗡嗡声),并计算信号的傅里叶变换。使用 DFT,要计算梅尔频率倒谱系数,您必须遵循以下步骤:

  1. 获取功率谱: |DFT|^2
  2. 计算三角组滤波器以将 hz 尺度转换为梅尔尺度
  3. 获取对数谱
  4. 应用离散余弦变换

Python 代码示例

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

:代码基于 MFCC鞋面示例。我希望这对你有帮助!

First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:

  1. Get power spectrum: |DFT|^2
  2. Compute a triangular bank filter to transform hz scale into mel scale
  3. Get log spectrum
  4. Apply discrete cossine transform

A python code example:

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

This code is based on MFCC Vamp example. I hope this help you!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文