核心音频、Goertzel 算法不起作用

发布于 2024-10-21 18:43:32 字数 4678 浏览 0 评论 0原文

我目前正在创建一个应用程序,它可以从 iPhone 的麦克风实时计算出预定义频率 (16780Hz) 的幅度。

我将声音数据存储在缓冲区中,并尝试使用 Goertzel(一种为此任务设计的算法)对其进行处理。 Goertzel 信息。这就是问题开始的地方。

当录制的声音频率 (5000Hz) 远低于定义的频率 (16780Hz) 时,该算法会给出非常积极的结果。事实上,结果比录制正确频率的声音时产生的结果要积极得多。

这是我的 goertzel 实现:

double goertzel(unsigned short *sample, int sampleRate, double Freq, int len )
{

double realW = 2.0 * cos(2.0 * M_PI * Freq / sampleRate);
double imagW = 2.0 * sin(2.0 * M_PI * Freq / sampleRate);
double d1 = 0;
double d2 = 0;
int z;
double y;
for (int i = 0; i < len; i++) {
    y=(double)(signed short)sample[i] +realW * d1 - d2;
    d2 = d1;
    d1 = y;
}
double rR = 0.5 * realW *d1-d2;
double rI = 0.5 * imagW *d1-d2;

return (sqrt(pow(rR, 2)+pow(rI,2)))/len;
} /* end function goertzel */

这是我检索音频的方法(如果它完全相关的话)

-(void)startListeningWithFrequency:(float)frequency;
{
OSStatus status;
//AudioComponentInstance audioUnit;
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_RemoteIO;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;

AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc);
status = AudioComponentInstanceNew( inputComponent, &audioUnit);
checkStatus(status);

UInt32 flag = 1;
status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input,kInputBus, &flag, sizeof(flag));
checkStatus(status);

AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate         = 44100.00;//44100.00;
audioFormat.mFormatID           = kAudioFormatLinearPCM;
audioFormat.mFormatFlags        = kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger;
audioFormat.mFramesPerPacket    = 1;
audioFormat.mChannelsPerFrame   = 1;
audioFormat.mBitsPerChannel     = 16;
//  float
audioFormat.mBytesPerPacket     = 2;
audioFormat.mBytesPerFrame      = 2;

status = AudioUnitSetProperty(audioUnit,
                              kAudioUnitProperty_StreamFormat,
                              kAudioUnitScope_Output,
                              kInputBus,
                              &audioFormat, 
                              sizeof(audioFormat));
checkStatus(status);
//status = AudioUnitSetProperty(audioUnit, 
//                            kAudioUnitProperty_StreamFormat, 
//                            kAudioUnitScope_Input, 
//                            kOutputBus, 
//                            &audioFormat, 
//                            sizeof(audioFormat));
checkStatus(status);
AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = recordingCallback;
callbackStruct.inputProcRefCon = self;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioOutputUnitProperty_SetInputCallback,
                              kAudioUnitScope_Global,
                              kInputBus, &callbackStruct, sizeof(callbackStruct));
checkStatus(status);
/*  UInt32 shouldAllocateBuffer = 1;
AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Global, 1, &shouldAllocateBuffer, sizeof(shouldAllocateBuffer));
*/
status = AudioOutputUnitStart(audioUnit);

}
static OSStatus recordingCallback(void *inRefCon, 
                              AudioUnitRenderActionFlags *ioActionFlags, 
                              const AudioTimeStamp *inTimeStamp, 
                              UInt32 inBusNumber, 
                              UInt32 inNumberFrames, 
                              AudioBufferList *ioData) {
AudioBuffer buffer;

buffer.mNumberChannels = 1;
buffer.mDataByteSize = inNumberFrames * 2;
//NSLog(@"%d",inNumberFrames);
buffer.mData = malloc( inNumberFrames * 2 );

// Put buffer in a AudioBufferList
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0] = buffer;



OSStatus status;
status = AudioUnitRender(audioUnit, 
                         ioActionFlags, 
                         inTimeStamp, 
                         inBusNumber, 
                         inNumberFrames, 
                         &bufferList);  
checkStatus(status);
//double g = calculateGoertzel((const char *)(&bufferList)->mBuffers[0].mData,16789.0,96000.0);
UInt16 *q = (UInt16 *)(&bufferList)->mBuffers[0].mData;
int N = sizeof(q)/sizeof(UInt16);
double Qr,Qi;
double theta = 2.0*M_PI*16780/44100;
double g = goertzel(q,44100,16780,N);

NSLog(@"goertzel:%f", g);
}

对于远低于 16780Hz 的频率,这将返回数百个数字,而对于 16780Hz 的频率,返回小得多的数字。

我非常沮丧,非常感谢您的帮助。

I am currently creating an application which works out the magnitude at a predefined frequency (16780Hz) in realtime from the iPhone's microphone.

I have the sound data in a buffer and I attempt to process it using Goertzel, an algorithm designed for this task. Goertzel info. This is where the problem begins.

The algorithm responds with very positive results when a sound is recorded of a frequency which is much lower (5000Hz) than the defined one (16780Hz). In fact the result is far more positive than that produced when a sound of the correct frequency is recorded.

Here is my implementation of goertzel:

double goertzel(unsigned short *sample, int sampleRate, double Freq, int len )
{

double realW = 2.0 * cos(2.0 * M_PI * Freq / sampleRate);
double imagW = 2.0 * sin(2.0 * M_PI * Freq / sampleRate);
double d1 = 0;
double d2 = 0;
int z;
double y;
for (int i = 0; i < len; i++) {
    y=(double)(signed short)sample[i] +realW * d1 - d2;
    d2 = d1;
    d1 = y;
}
double rR = 0.5 * realW *d1-d2;
double rI = 0.5 * imagW *d1-d2;

return (sqrt(pow(rR, 2)+pow(rI,2)))/len;
} /* end function goertzel */

Here is how I retrieve the audio if it is at all relevant

-(void)startListeningWithFrequency:(float)frequency;
{
OSStatus status;
//AudioComponentInstance audioUnit;
AudioComponentDescription desc;
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_RemoteIO;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;

AudioComponent inputComponent = AudioComponentFindNext(NULL, &desc);
status = AudioComponentInstanceNew( inputComponent, &audioUnit);
checkStatus(status);

UInt32 flag = 1;
status = AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input,kInputBus, &flag, sizeof(flag));
checkStatus(status);

AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate         = 44100.00;//44100.00;
audioFormat.mFormatID           = kAudioFormatLinearPCM;
audioFormat.mFormatFlags        = kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger;
audioFormat.mFramesPerPacket    = 1;
audioFormat.mChannelsPerFrame   = 1;
audioFormat.mBitsPerChannel     = 16;
//  float
audioFormat.mBytesPerPacket     = 2;
audioFormat.mBytesPerFrame      = 2;

status = AudioUnitSetProperty(audioUnit,
                              kAudioUnitProperty_StreamFormat,
                              kAudioUnitScope_Output,
                              kInputBus,
                              &audioFormat, 
                              sizeof(audioFormat));
checkStatus(status);
//status = AudioUnitSetProperty(audioUnit, 
//                            kAudioUnitProperty_StreamFormat, 
//                            kAudioUnitScope_Input, 
//                            kOutputBus, 
//                            &audioFormat, 
//                            sizeof(audioFormat));
checkStatus(status);
AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = recordingCallback;
callbackStruct.inputProcRefCon = self;
status = AudioUnitSetProperty(audioUnit, 
                              kAudioOutputUnitProperty_SetInputCallback,
                              kAudioUnitScope_Global,
                              kInputBus, &callbackStruct, sizeof(callbackStruct));
checkStatus(status);
/*  UInt32 shouldAllocateBuffer = 1;
AudioUnitSetProperty(audioUnit, kAudioUnitProperty_ShouldAllocateBuffer, kAudioUnitScope_Global, 1, &shouldAllocateBuffer, sizeof(shouldAllocateBuffer));
*/
status = AudioOutputUnitStart(audioUnit);

}
static OSStatus recordingCallback(void *inRefCon, 
                              AudioUnitRenderActionFlags *ioActionFlags, 
                              const AudioTimeStamp *inTimeStamp, 
                              UInt32 inBusNumber, 
                              UInt32 inNumberFrames, 
                              AudioBufferList *ioData) {
AudioBuffer buffer;

buffer.mNumberChannels = 1;
buffer.mDataByteSize = inNumberFrames * 2;
//NSLog(@"%d",inNumberFrames);
buffer.mData = malloc( inNumberFrames * 2 );

// Put buffer in a AudioBufferList
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0] = buffer;



OSStatus status;
status = AudioUnitRender(audioUnit, 
                         ioActionFlags, 
                         inTimeStamp, 
                         inBusNumber, 
                         inNumberFrames, 
                         &bufferList);  
checkStatus(status);
//double g = calculateGoertzel((const char *)(&bufferList)->mBuffers[0].mData,16789.0,96000.0);
UInt16 *q = (UInt16 *)(&bufferList)->mBuffers[0].mData;
int N = sizeof(q)/sizeof(UInt16);
double Qr,Qi;
double theta = 2.0*M_PI*16780/44100;
double g = goertzel(q,44100,16780,N);

NSLog(@"goertzel:%f", g);
}

This returns numbers in the hundreds for frequency much lower than 16780Hz, and for frequencies of 16780Hz returns much smaller numbers.

I am very frustrated and help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

錯遇了你 2024-10-28 18:43:32

只是猜测:

根据奈奎斯特-香农采样定理,采样率应至少是您尝试测量的频率的两倍。而你的是,但只是勉强。 44.1kHz 的采样率是测量 22kHz 信号的外沿。 16kHz 的信号足够接近混叠可能导致波形分析出现问题的极限。这是一张图片来说明我的观点:
在此处输入图像描述

因此,我猜您需要更高的采样率。为什么不尝试通过该算法运行纯 16kHz 正弦波,看看它是否能做得更好?如果测试数据中只有单一频率,则混叠问题就不那么严重了。如果您从正弦波中获得更高的响应,那么您可能只需要更高的采样率。

Just a guess:

According to the Nyquist–Shannon sampling theorem, the sampling rate should be at least twice the frequency that you are trying to measure. And yours is, but just barely. A sampling rate of 44.1kHz is the outer edge for measuring 22kHz signals. A signal of 16kHz is close enough to the limit that aliasing might cause problems with your wave analysis. Here's a picture to illustrate my point:
enter image description here

So, I would guess that you need a higher sample rate. Why don't you try running a pure 16kHz sine wave through the algorithm, to see if it does better with that? Aliasing will be less of an issue if you only have a single frequency in the test data. If you get a higher response from the sine wave, then you probably just need a higher sampling rate.

ぺ禁宫浮华殁 2024-10-28 18:43:32

看起来 Goertzel 滤波器中使用的谐振器是 1 极谐振器的一级近似值。这将大大降低每步高相位角下的精度和稳定性。使用更好的三角函数近似值的 1-bin DFT 在如此高的频率下可能会工作得更好。

iPhone 麦克风的频率响应可能会在如此高的频率下下降。

添加:

对于 1-bin DFT,请在内部循环中尝试此操作:

d1 += (double)sample[i] * cos(2.0*M_PI*i*Freq/sampleRate);
d2 += (double)sample[i] * sin(2.0*M_PI*i*Freq/sampleRate);

然后返回:

dR = d1;
dI = d2;
magnitude = sqrt(dR*dR + dI*dI) / (double)len;

请注意,对于固定频率和采样率,可以在音频回调之外预先计算三角函数并将其保存在数组或查找表中。如果您不进行这样的优化,则在音频回调中调用多个双精度超越函数可能会太慢和/或浪费大量电池电量,但可能会在典型的快速 PC 上模拟正常。

DFT 的定义长度是箱频率 Freq 的周期的精确整数,但其他长度也适用于包含不同量的所谓频谱“泄漏”和/或扇形误差的近似值。滤波器频率响应的宽度大致与 DFT 长度成反比。此外,频率越接近 Fs/2,DFT 需要的时间就越长,以避免复杂的图像混叠,也许长度为 N*Fs/(Fs/2 - Freq) 的多个周期会是更好的长度。您可能需要保存或排队样本以获得适当的长度(而不仅仅是使用音频回调给您提供的缓冲区长度)。

It looks like the resonator used in your Goertzel filter is a 1st degree approximation to 1-pole resonator. This will greatly decrease in accuracy and stability at high phase angles per step. A 1-bin DFT using a better approximation to the trig functions might work better at such high frequencies.

And the iPhone microphone frequency response likely rolls off at such high frequencies.

ADDED:

For a 1-bin DFT, try this in your inner loop:

d1 += (double)sample[i] * cos(2.0*M_PI*i*Freq/sampleRate);
d2 += (double)sample[i] * sin(2.0*M_PI*i*Freq/sampleRate);

Then return:

dR = d1;
dI = d2;
magnitude = sqrt(dR*dR + dI*dI) / (double)len;

Note that for a fixed frequency and sample rate the trig functions can be pre-calculated outside of the audio callback and saved in an array or lookup table. If you don't do some optimization like this, calling multiple double precision transcendental functions inside your audio callback may be way too slow and/or waste a lot of battery power, but may simulate OK on a typical fast PC.

The DFT is defined for a length which is an exact integral number of periods of the bin frequency Freq, but other lengths will work for approximations containing varying amounts of so-called spectral "leakage" and/or scalloping errors. The width of the filter frequency response will be roughly inversely proportional to the DFT length. Also the closer the frequency is to Fs/2, the longer the DFT will need to be to avoid complex image aliasing, maybe multiple periods of length N*Fs/(Fs/2 - Freq) would be a better length. You may need to save or queue up samples to get an appropriate length (and not just use the buffer length given you by the audio callback).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文