iPhone 上的 FFT 可忽略背景噪音并找到低音

发布于 2024-12-01 09:27:50 字数 1837 浏览 1 评论 0原文

我已经为 iPhone 实现了 Demetri 的 Pitch Detector 项目,并遇到了两个问题。 1) 任何类型的背景噪音都会导致频率读数出现异常,2) 低频声音的音调不正确。我尝试调音我的吉他,虽然较高的琴弦有效,但调音器无法正确识别低音 E。

音高检测代码位于 RIOInterface.mm 中,内容如下...

// get the data
AudioUnitRender(...);

// convert int16 to float
Convert(...);

// divide the signal into even-odd configuration
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);

// apply the fft
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);

// convert split real form to split vector
vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);

Demetri 然后继续确定“主音” '频率如下:

float dominantFrequency = 0;
int bin = -1;
for (int i=0; i<n; i+=2) {
    float curFreq = MagnitudeSquared(outputBuffer[i], outputBuffer[i+1]);
    if (curFreq > dominantFrequency) {
        dominantFrequency = curFreq;
        bin = (i+1)/2;
    }
}
memset(outputBuffer, 0, n*sizeof(SInt16));

// Update the UI with our newly acquired frequency value.
[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];

首先,我相信我需要应用低通滤波器......但我不是 FFT 专家,并且不确定在哪里或如何针对从 vDSP 函数返回的数据执行此操作。我也不确定如何提高较低频率下代码的准确性。似乎还有其他算法可以确定主频率 - 但同样,在使用 Apple Accelerate 框架返回的数据时寻找正确的方向。

更新:

加速框架实际上有一些窗口函数。我设置了一个像这样的基本窗口

windowSize = maxFrames;
transferBuffer = (float*)malloc(sizeof(float)*windowSize);
window = (float*)malloc(sizeof(float)*windowSize);
memset(window, 0, sizeof(float)*windowSize);
vDSP_hann_window(window, windowSize, vDSP_HANN_NORM); 

,然后通过

vDSP_vmul(outputBuffer, 1, window, 1, transferBuffer, 1, windowSize); 

在 vDSP_ctoz 函数之前插入来应用它。然后,我将其余代码更改为使用“transferBuffer”而不是outputBuffer...但到目前为止,还没有注意到最终音高猜测有任何显着变化。

I have implemented Demetri's Pitch Detector project for the iPhone and hitting up against two problems. 1) any sort of background noise sends the frequency reading bananas and 2) lower frequency sounds aren't being pitched correctly. I tried to tune my guitar and while the higher strings worked - the tuner could not correctly discern the low E.

The Pitch Detection code is located in RIOInterface.mm and goes something like this ...

// get the data
AudioUnitRender(...);

// convert int16 to float
Convert(...);

// divide the signal into even-odd configuration
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);

// apply the fft
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);

// convert split real form to split vector
vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);

Demetri then goes on to determine the 'dominant' frequency as follows:

float dominantFrequency = 0;
int bin = -1;
for (int i=0; i<n; i+=2) {
    float curFreq = MagnitudeSquared(outputBuffer[i], outputBuffer[i+1]);
    if (curFreq > dominantFrequency) {
        dominantFrequency = curFreq;
        bin = (i+1)/2;
    }
}
memset(outputBuffer, 0, n*sizeof(SInt16));

// Update the UI with our newly acquired frequency value.
[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];

To start with, I believe I need to apply a LOW PASS FILTER ... but I'm not an FFT expert and not sure exactly where or how to do that against the data returned from the vDSP functions. I'm also not sure how to improve the accuracy of the code in the lower frequencies. There seem to be other algorithms to determine the dominant frequency - but again, looking for a kick in the right direction when using the data returned by Apple's Accelerate framework.

UPDATE:

The accelerate framework actually has some windowing functions. I setup a basic window like this

windowSize = maxFrames;
transferBuffer = (float*)malloc(sizeof(float)*windowSize);
window = (float*)malloc(sizeof(float)*windowSize);
memset(window, 0, sizeof(float)*windowSize);
vDSP_hann_window(window, windowSize, vDSP_HANN_NORM); 

which I then apply by inserting

vDSP_vmul(outputBuffer, 1, window, 1, transferBuffer, 1, windowSize); 

before the vDSP_ctoz function. I then change the rest of the code to use 'transferBuffer' instead of outputBuffer ... but so far, haven't noticed any dramatic changes in the final pitch guess.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

错々过的事 2024-12-08 09:27:50

音调与峰值幅度频率箱不同(这是加速框架中的 FFT 可能直接为您提供的)。因此任何峰值频率检测器对于基音估计都是不可靠的。当音符缺少或非常弱的基音(常见于某些语音、钢琴和吉他声音)和/或频谱中存在大量强大的泛音时,低通滤波器将无济于事。

看看你的音乐声音的宽带频谱或频谱图,你就会发现问题。

通常需要其他方法来更可靠地估计音高。其中一些包括自相关方法(AMDF、ASDF)、倒谱/倒谱分析、谐波乘积谱、相位声码器和/或复合算法,例如 RAPT(稳健的音调跟踪算法)和 YAAPT。 FFT 仅作为上述某些方法的子部分有用。

Pitch is not the same as peak magnitude frequency bin (which is what the FFT in the Accelerate framework might give you directly). So any peak frequency detector will not be reliable for pitch estimation. A low-pass filter will not help when the note has a missing or very weak fundamental (common in some voice, piano and guitar sounds) and/or lots of powerful overtones in its spectrum.

Look at a wide-band spectrum or spectrograph of your musical sounds and you will see the problem.

Other methods are usually needed for a more reliable estimate of musical pitch. Some of these include autocorrelation methods (AMDF, ASDF), Cepstrum/Cepstral analysis, harmonic product spectrum, phase vocoder, and/or composite algorithms such as RAPT (Robust Algorithm for Pitch Tracking) and YAAPT. An FFT is useful as only a sub-part of some of the above methods.

爱你是孤单的心事 2024-12-08 09:27:50

在计算之前,您至少需要对时域数据应用窗口函数快速傅里叶变换。如果不执行此步骤,功率谱将包含伪影(请参阅:频谱泄漏),这会干扰您的尝试在提取音高信息时。

一个简单的 Hann (又名 Hanning)窗口就足够了。

At the very least you need to apply a window function to your time domain data, prior to calculating the FFT. Without this step the power spectrum will contain artefacts (see: spectral leakage) which will interfere with your attempts at extracting pitch information.

A simple Hann (aka Hanning) window should suffice.

无畏 2024-12-08 09:27:50

您的采样频率和块大小是多少? Low E 约为 80 Hz,因此您需要确保捕获块足够长以捕获此频率下的许多周期。这是因为傅里叶变换将频谱划分为多个区间,每个区间几个赫兹宽。例如,如果您以 44.1 kHz 采样并有 1024 点时域样本,则每个 bin 的宽度将为 44100/1024 = 43.07 Hz。因此,低 E 将位于第二个容器中。出于多种原因(与频谱泄漏和有限时间块的性质有关),实际上,您应该高度怀疑地考虑 FFT 结果中的前 3 或 4 个数据桶。

如果将采样率降至 8 kHz,则相同的块大小将为您提供 7.8125 Hz 宽的 bin。现在低 E 将位于第 10 或第 11 个 bin,这要好得多。您还可以使用更长的块大小。

正如 Paul R 指出的,您必须使用窗口来减少频谱泄漏。

What is your sample frequency and blocksize? Low E is around 80 Hz, so you need to make sure your capture block is long enough to capture many cycles at this frequency. This is because the Fourier Transform divides the frequency spectrum into bins, each several Hz wide. If you sample at 44.1 kHz and have a 1024 point time domain sample, for instance, each bin will be 44100/1024 = 43.07 Hz wide. Thus a low E would be in the second bin. For a bunch of reasons (to do with spectral leakage and the nature of finite time blocks), practically speaking you should consider the first 3 or 4 bins of data in an FFT result with extreme suspicion.

If you drop the sample rate to 8 kHz, the same blocksize gives you bins that are 7.8125 Hz wide. Now low E will be in the 10th or 11th bin, which is much better. You could also use a longer blocksize.

And as Paul R points out, you MUST use a window to reduce spectral leakage.

无人问我粥可暖 2024-12-08 09:27:50

iPhone 的频率响应函数在 100 - 200 Hz 以下时下降(请参阅 http://blog.faberacoustical.com/2009/ios/iphone/iphone-microphone-Frequency-response-comparison/)。

如果您试图检测低音吉他弦的基本模式,麦克风可能会充当滤波器并抑制您感兴趣的频率。如果您有兴趣使用可以获得的 fft 数据,有几个选项 -您可以在要检测的音符周围的频域中对数据进行窗口化,以便您所看到的只是第一个模式,即使它的幅度低于较高模式(即有一个开关来调整第一个字符串并将其放在在此模式下)。

或者,您可以对声音数据进行低通滤波 - 您可以在时域中执行此操作,甚至更容易,因为您已经在频域中拥有频域数据。一个很简单的时域低通滤波器就是做一个时间移动平均滤波器。一个非常简单的频域低通滤波器是将 fft 幅度乘以一个在低频范围内为 1 的向量,并在较高频率范围内线性(甚至阶跃)斜坡下降。

The frequency response function of the iPhone drops off below 100 - 200 Hz (see http://blog.faberacoustical.com/2009/ios/iphone/iphone-microphone-frequency-response-comparison/ for an example).

If you are trying to detect the fundamental mode of a low guitar string, the microphone might be acting as a filter and suppressing the frequency you are interested in. There are a couple of options if you interested in using the fft data you can get - you can window the data in the frequency domain around the note you are trying to detect so that all you can see is the first mode even if it is of lower magnitude than higher modes(i.e. have a toggle to tune the first string and put it in this mode).

Or you can low pass filter the sound data - you can do this either in the time domain or even easier since you already have frequency domain data, in the frequency domain. A very simple time domain low pass filter is to do a time-moving average filter. A very simple frequency domain low pass filter is to multiply your fft magnitudes by a vector with 1's in the low frequency range and a linear (or even a step) ramp down in the higher frequencies.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文