iPhone 上的 FFT 可忽略背景噪音并找到低音

发布于 2024-12-01 09:27:50 字数 1837 浏览 11 评论 0原文

我已经为 iPhone 实现了 Demetri 的 Pitch Detector 项目，并遇到了两个问题。 1) 任何类型的背景噪音都会导致频率读数出现异常，2) 低频声音的音调不正确。我尝试调音我的吉他，虽然较高的琴弦有效，但调音器无法正确识别低音 E。

音高检测代码位于 RIOInterface.mm 中，内容如下...

// get the data
AudioUnitRender(...);

// convert int16 to float
Convert(...);

// divide the signal into even-odd configuration
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);

// apply the fft
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);

// convert split real form to split vector
vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);

Demetri 然后继续确定“主音” '频率如下：

float dominantFrequency = 0;
int bin = -1;
for (int i=0; i<n; i+=2) {
    float curFreq = MagnitudeSquared(outputBuffer[i], outputBuffer[i+1]);
    if (curFreq > dominantFrequency) {
        dominantFrequency = curFreq;
        bin = (i+1)/2;
    }
}
memset(outputBuffer, 0, n*sizeof(SInt16));

// Update the UI with our newly acquired frequency value.
[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];

首先，我相信我需要应用低通滤波器......但我不是 FFT 专家，并且不确定在哪里或如何针对从 vDSP 函数返回的数据执行此操作。我也不确定如何提高较低频率下代码的准确性。似乎还有其他算法可以确定主频率 - 但同样，在使用 Apple Accelerate 框架返回的数据时寻找正确的方向。

更新：

加速框架实际上有一些窗口函数。我设置了一个像这样的基本窗口

windowSize = maxFrames;
transferBuffer = (float*)malloc(sizeof(float)*windowSize);
window = (float*)malloc(sizeof(float)*windowSize);
memset(window, 0, sizeof(float)*windowSize);
vDSP_hann_window(window, windowSize, vDSP_HANN_NORM);

，然后通过

vDSP_vmul(outputBuffer, 1, window, 1, transferBuffer, 1, windowSize);

在 vDSP_ctoz 函数之前插入来应用它。然后，我将其余代码更改为使用“transferBuffer”而不是outputBuffer...但到目前为止，还没有注意到最终音高猜测有任何显着变化。

原文

I have implemented Demetri's Pitch Detector project for the iPhone and hitting up against two problems. 1) any sort of background noise sends the frequency reading bananas and 2) lower frequency sounds aren't being pitched correctly. I tried to tune my guitar and while the higher strings worked - the tuner could not correctly discern the low E.

The Pitch Detection code is located in RIOInterface.mm and goes something like this ...

// get the data
AudioUnitRender(...);

// convert int16 to float
Convert(...);

// divide the signal into even-odd configuration
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);

// apply the fft
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);

// convert split real form to split vector
vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);

Demetri then goes on to determine the 'dominant' frequency as follows:

float dominantFrequency = 0;
int bin = -1;
for (int i=0; i<n; i+=2) {
    float curFreq = MagnitudeSquared(outputBuffer[i], outputBuffer[i+1]);
    if (curFreq > dominantFrequency) {
        dominantFrequency = curFreq;
        bin = (i+1)/2;
    }
}
memset(outputBuffer, 0, n*sizeof(SInt16));

// Update the UI with our newly acquired frequency value.
[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];

To start with, I believe I need to apply a LOW PASS FILTER ... but I'm not an FFT expert and not sure exactly where or how to do that against the data returned from the vDSP functions. I'm also not sure how to improve the accuracy of the code in the lower frequencies. There seem to be other algorithms to determine the dominant frequency - but again, looking for a kick in the right direction when using the data returned by Apple's Accelerate framework.

UPDATE:

The accelerate framework actually has some windowing functions. I setup a basic window like this

windowSize = maxFrames;
transferBuffer = (float*)malloc(sizeof(float)*windowSize);
window = (float*)malloc(sizeof(float)*windowSize);
memset(window, 0, sizeof(float)*windowSize);
vDSP_hann_window(window, windowSize, vDSP_HANN_NORM);

which I then apply by inserting

vDSP_vmul(outputBuffer, 1, window, 1, transferBuffer, 1, windowSize);

before the vDSP_ctoz function. I then change the rest of the code to use 'transferBuffer' instead of outputBuffer ... but so far, haven't noticed any dramatic changes in the final pitch guess.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

错々过的事 2024-12-08 09:27:50

音调与峰值幅度频率箱不同（这是加速框架中的 FFT 可能直接为您提供的）。因此任何峰值频率检测器对于基音估计都是不可靠的。当音符缺少或非常弱的基音（常见于某些语音、钢琴和吉他声音）和/或频谱中存在大量强大的泛音时，低通滤波器将无济于事。

看看你的音乐声音的宽带频谱或频谱图，你就会发现问题。

通常需要其他方法来更可靠地估计音高。其中一些包括自相关方法（AMDF、ASDF）、倒谱/倒谱分析、谐波乘积谱、相位声码器和/或复合算法，例如 RAPT（稳健的音调跟踪算法）和 YAAPT。 FFT 仅作为上述某些方法的子部分有用。

回复收藏 0 原文

爱你是孤单的心事 2024-12-08 09:27:50

在计算之前，您至少需要对时域数据应用窗口函数快速傅里叶变换。如果不执行此步骤，功率谱将包含伪影（请参阅：频谱泄漏），这会干扰您的尝试在提取音高信息时。

一个简单的 Hann （又名 Hanning）窗口就足够了。

回复收藏 0 原文

无畏 2024-12-08 09:27:50

您的采样频率和块大小是多少？ Low E 约为 80 Hz，因此您需要确保捕获块足够长以捕获此频率下的许多周期。这是因为傅里叶变换将频谱划分为多个区间，每个区间几个赫兹宽。例如，如果您以 44.1 kHz 采样并有 1024 点时域样本，则每个 bin 的宽度将为 44100/1024 = 43.07 Hz。因此，低 E 将位于第二个容器中。出于多种原因（与频谱泄漏和有限时间块的性质有关），实际上，您应该高度怀疑地考虑 FFT 结果中的前 3 或 4 个数据桶。

如果将采样率降至 8 kHz，则相同的块大小将为您提供 7.8125 Hz 宽的 bin。现在低 E 将位于第 10 或第 11 个 bin，这要好得多。您还可以使用更长的块大小。

正如 Paul R 指出的，您必须使用窗口来减少频谱泄漏。

回复收藏 0 原文