VB FFT - 难以理解结果与频率的关系

发布于 2024-07-06 22:23:17 字数 655 浏览 7 评论 0原文

试图理解我正在使用(窃取)(回收)的 fft(快速傅里叶变换)例程,

输入是一个由 512 个数据点组成的数组,它们是样本波形。 测试数据生成到该数组中。 fft 将该数组变换到频域。 尝试理解频率、周期、采样率和 fft 数组中位置之间的关系。 我将举例说明:

============================================

采样率为1000 个样本/秒。 生成一组 10Hz 的样本。

输入数组的峰值位于 arr(28)、arr(128)、arr(228) ... period = 100 个样本点

fft 数组中的峰值位于索引 6 处(不包括 0 处的巨大值)

============================== ============

采样率为 8000 个样本/秒 生成 440Hz 的样本

集 输入数组峰值包括 arr(7)、arr(25)、arr(43)、arr(61) ... period = 18 个样本点

fft 数组中的峰值位于索引 29(不包括 0 处的巨大值)

============================== ============

如何将 fft 数组中峰值的索引与频率相关联?

Trying to understand an fft (Fast Fourier Transform) routine I'm using (stealing)(recycling)

Input is an array of 512 data points which are a sample waveform.
Test data is generated into this array. fft transforms this array into frequency domain.
Trying to understand relationship between freq, period, sample rate and position in fft array. I'll illustrate with examples:

========================================

Sample rate is 1000 samples/s.
Generate a set of samples at 10Hz.

Input array has peak values at arr(28), arr(128), arr(228) ...
period = 100 sample points

peak value in fft array is at index 6 (excluding a huge value at 0)

========================================

Sample rate is 8000 samples/s
Generate set of samples at 440Hz

Input array peak values include arr(7), arr(25), arr(43), arr(61) ...
period = 18 sample points

peak value in fft array is at index 29 (excluding a huge value at 0)

========================================

How do I relate the index of the peak in the fft array to frequency ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

一指流沙 2024-07-13 22:23:17

如果忽略虚部,频率分布在各个 bin 之间呈线性:

Frequency@i = (采样率/2)*(i/Nbins)。

因此,对于第一个示例,假设您有 256 个 bin,最大的 bin 对应的频率为 1000/2 * 6/256 = 11.7 Hz。
由于您的输入为 10Hz,我猜测 bin 5 (9.7Hz) 也有一个很大的分量。
为了获得更好的准确性,您需要采集更多样本,以获得更小的垃圾箱。

您的第二个示例给出 8000/2*29/256 = 453Hz。 再次强调,关闭,但您需要更多垃圾箱。
你这里的分辨率只有4000/256 = 15.6Hz。

If you ignore the imaginary part, the frequency distribution is linear across bins:

Frequency@i = (Sampling rate/2)*(i/Nbins).

So for your first example, assumming you had 256 bins, the largest bin corresponds to a frequency of 1000/2 * 6/256 = 11.7 Hz.
Since your input was 10Hz, I'd guess that bin 5 (9.7Hz) also had a big component.
To get better accuracy, you need to take more samples, to get smaller bins.

Your second example gives 8000/2*29/256 = 453Hz. Again, close, but you need more bins.
Your resolution here is only 4000/256 = 15.6Hz.

浅忆 2024-07-13 22:23:17

如果您提供示例数据集,将会很有帮助。

我的猜测是你有所谓的采样工件。 DC(频率 0)处的强信号表明情况确实如此。

您应该始终确保输入数据中的平均值为零 - 在调用 fft 之前找到平均值并从每个样本点中减去它是一个很好的做法。

同样,您必须小心采样窗口伪影。 重要的是第一个和最后一个数据点接近于零,否则从采样窗口外部到内部的“步骤”会产生以不同频率注入大量能量的效果。

最重要的是,进行 FFT 分析比简单地回收某处找到的 FFT 例程需要更加小心。

这是问题中描述的 10Hz 信号的前 100 个采样点,经过处理以避免采样伪影

> sinx[1:100]
  [1]  0.000000e+00  6.279052e-02  1.253332e-01  1.873813e-01  2.486899e-01  3.090170e-01  3.681246e-01  4.257793e-01  4.817537e-01  5.358268e-01
 [11]  5.877853e-01  6.374240e-01  6.845471e-01  7.289686e-01  7.705132e-01  8.090170e-01  8.443279e-01  8.763067e-01  9.048271e-01  9.297765e-01
 [21]  9.510565e-01  9.685832e-01  9.822873e-01  9.921147e-01  9.980267e-01  1.000000e+00  9.980267e-01  9.921147e-01  9.822873e-01  9.685832e-01
 [31]  9.510565e-01  9.297765e-01  9.048271e-01  8.763067e-01  8.443279e-01  8.090170e-01  7.705132e-01  7.289686e-01  6.845471e-01  6.374240e-01
 [41]  5.877853e-01  5.358268e-01  4.817537e-01  4.257793e-01  3.681246e-01  3.090170e-01  2.486899e-01  1.873813e-01  1.253332e-01  6.279052e-02
 [51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
 [61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
 [71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
 [81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
 [91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02

这是 fft 频域的最终绝对值

 [1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01

It would be helpful if you were to provide your sample dataset.

My guess would be that you have what are called sampling artifacts. The strong signal at DC ( frequency 0 ) suggests that this is the case.

You should always ensure that the average value in your input data is zero - find the average and subtract it from each sample point before invoking the fft is good practice.

Along the same lines, you have to be careful about the sampling window artifact. It is important that the first and last data point are close to zero because otherwise the "step" from outside to inside the sampling window has the effect of injecting a whole lot of energy at different frequencies.

The bottom line is that doing an fft analysis requires more care than simply recycling a fft routine found somewhere.

Here are the first 100 sample points of a 10Hz signal as described in the question, massaged to avoid sampling artifacts

> sinx[1:100]
  [1]  0.000000e+00  6.279052e-02  1.253332e-01  1.873813e-01  2.486899e-01  3.090170e-01  3.681246e-01  4.257793e-01  4.817537e-01  5.358268e-01
 [11]  5.877853e-01  6.374240e-01  6.845471e-01  7.289686e-01  7.705132e-01  8.090170e-01  8.443279e-01  8.763067e-01  9.048271e-01  9.297765e-01
 [21]  9.510565e-01  9.685832e-01  9.822873e-01  9.921147e-01  9.980267e-01  1.000000e+00  9.980267e-01  9.921147e-01  9.822873e-01  9.685832e-01
 [31]  9.510565e-01  9.297765e-01  9.048271e-01  8.763067e-01  8.443279e-01  8.090170e-01  7.705132e-01  7.289686e-01  6.845471e-01  6.374240e-01
 [41]  5.877853e-01  5.358268e-01  4.817537e-01  4.257793e-01  3.681246e-01  3.090170e-01  2.486899e-01  1.873813e-01  1.253332e-01  6.279052e-02
 [51] -2.542075e-15 -6.279052e-02 -1.253332e-01 -1.873813e-01 -2.486899e-01 -3.090170e-01 -3.681246e-01 -4.257793e-01 -4.817537e-01 -5.358268e-01
 [61] -5.877853e-01 -6.374240e-01 -6.845471e-01 -7.289686e-01 -7.705132e-01 -8.090170e-01 -8.443279e-01 -8.763067e-01 -9.048271e-01 -9.297765e-01
 [71] -9.510565e-01 -9.685832e-01 -9.822873e-01 -9.921147e-01 -9.980267e-01 -1.000000e+00 -9.980267e-01 -9.921147e-01 -9.822873e-01 -9.685832e-01
 [81] -9.510565e-01 -9.297765e-01 -9.048271e-01 -8.763067e-01 -8.443279e-01 -8.090170e-01 -7.705132e-01 -7.289686e-01 -6.845471e-01 -6.374240e-01
 [91] -5.877853e-01 -5.358268e-01 -4.817537e-01 -4.257793e-01 -3.681246e-01 -3.090170e-01 -2.486899e-01 -1.873813e-01 -1.253332e-01 -6.279052e-02

And here is the resulting absolute values of the fft frequency domain

 [1] 7.160038e-13 1.008741e-01 2.080408e-01 3.291725e-01 4.753899e-01 6.653660e-01 9.352601e-01 1.368212e+00 2.211653e+00 4.691243e+00 5.001674e+02
[12] 5.293086e+00 2.742218e+00 1.891330e+00 1.462830e+00 1.203175e+00 1.028079e+00 9.014559e-01 8.052577e-01 7.294489e-01
緦唸λ蓇 2024-07-13 22:23:17

我对数学和信号处理也有点生疏,但有了额外的信息,我可以尝试一下。

如果您想知道每个 bin 的信号能量,您需要复数输出的幅度。 因此,仅查看实际输出是不够的。 即使输入只是实数。 对于每个 bin,输出的幅度为 sqrt(real^2 + imag^2),就像毕达哥拉斯:-)

bin 0 到 449 是从 0 Hz 到 500 Hz 的正频率。 bin 500 到 1000 是负频率,应该与真实信号的正频率相同。 如果每秒处理一个缓冲区,则频率和数组索引会很好地对齐。 所以索引 6 处的峰值对应于 6Hz,所以这有点奇怪。 这可能是因为您只查看实际输出数据,而实际数据和虚数数据结合起来给出索引 10 处的预期峰值。频率应线性映射到箱。

0 处的峰值表示 DC 偏移。

I'm a little rusty too on math and signal processing but with the additional info I can give it a shot.

If you want to know the signal energy per bin you need the magnitude of the complex output. So just looking at the real output is not enough. Even when the input is only real numbers. For every bin the magnitude of the output is sqrt(real^2 + imag^2), just like pythagoras :-)

bins 0 to 449 are positive frequencies from 0 Hz to 500 Hz. bins 500 to 1000 are negative frequencies and should be the same as the positive for a real signal. If you process one buffer every second frequencies and array indices line up nicely. So the peak at index 6 corresponds with 6Hz so that's a bit strange. This might be because you're only looking at the real output data and the real and imaginary data combine to give an expected peak at index 10. The frequencies should map linearly to the bins.

The peaks at 0 indicates a DC offset.

甲如呢乙后呢 2024-07-13 22:23:17

我已经有一段时间没有做过 FFT 了,但我记得

FFT 通常采用复数作为输入和输出。 所以我不太确定输入和输出的实部和虚部如何映射到数组。

我真的不明白你在做什么。 在第一个示例中,您说您以 10Hz 处理样本缓冲区,采样率为 1000 Hz? 因此,每秒应该有 10 个缓冲区,每个缓冲区有 100 个样本。 我不明白你的输入数组如何至少有 228 个样本长。

通常输出缓冲器的前半部分是从 0 频率(=直流偏移)到 1/2 采样率的频率区间。 后半部分是负频率。 如果您的输入只是实数数据,虚数信号为 0,则正频率和负频率相同。 输出上的实部/虚部信号的关系包含输入信号的相位信息。

It's been some time since I've done FFT's but here's what I remember

FFT usually takes complex numbers as input and output. So I'm not really sure how the real and imaginary part of the input and output map to the arrays.

I don't really understand what you're doing. In the first example you say you process sample buffers at 10Hz for a sample rate of 1000 Hz? So you should have 10 buffers per second with 100 samples each. I don't get how your input array can be at least 228 samples long.

Usually the first half of the output buffer are frequency bins from 0 frequency (=dc offset) to 1/2 sample rate. and the 2nd half are negative frequencies. if your input is only real data with 0 for the imaginary signal positive and negative frequencies are the same. The relationship of real/imaginary signal on the output contains phase information from your input signal.

捂风挽笑 2024-07-13 22:23:17

bin i 的频率为 i * (采样率 / n),其中 n 是 FFT 输入窗口中的样本数。

如果您正在处理音频,由于音调与频率的对数成正比,因此箱的音调分辨率会随着频率的增加而增加 - 很难准确地解析低频信号。 为此,您需要使用更大的 FFT 窗口,这会降低时间分辨率。 对于给定的采样率,需要权衡频率与时间分辨率。

您提到了一个值为 0 的较大值的 bin——这是频率为 0 的 bin,即直流分量。 如果这个值很大,那么你的值大概是正的。 Bin n/2(在您的情况下为256)是奈奎斯特频率,采样率的一半,这是以此速率在采样信号中可以解析的最高频率。

如果信号是实数,则 bin n/2+1 到 n-1 将分别包含 bin n/2-1 到 1 的复共轭。 DC 值仅出现一次。

The frequency for bin i is i * (samplerate / n), where n is the number of samples in the FFT's input window.

If you're handling audio, since pitch is proportional to log of frequency, the pitch resolution of the bins increases as the frequency does -- it's hard to resolve low frequency signals accurately. To do so you need to use larger FFT windows, which reduces time resolution. There is a tradeoff of frequency against time resolution for a given sample rate.

You mention a bin with a large value at 0 -- this is the bin with frequency 0, i.e. the DC component. If this is large, then presumably your values are generally positive. Bin n/2 (in your case 256) is the Nyquist frequency, half the sample rate, which is the highest frequency that can be resolved in the sampled signal at this rate.

If the signal is real, then bins n/2+1 to n-1 will contain the complex conjugates of bins n/2-1 to 1, respectively. The DC value only appears once.

我的奇迹 2024-07-13 22:23:17

正如其他人所说,样本在频域中是等间隔的(不是对数的)。

例如 1,您应该得到:

替代文本 http://home.comcast。 net/~kootsoop/images/SINE1.jpg

对于另一个示例,您应该得到

alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg

因此,关于峰值位置,您的答案似乎都是正确的。

我没有得到的是大的直流分量。 您确定要生成正弦波作为输入吗? 输入是否变为负值? 对于正弦波,只要获得足够的周期,直流电应该接近于零。

The samples are, as others have said, equally spaced in the frequency domain (not logarithmic).

For example 1, you should get this:

alt text http://home.comcast.net/~kootsoop/images/SINE1.jpg

For the other example you should get

alt text http://home.comcast.net/~kootsoop/images/SINE2.jpg

So your answers both appear to be correct regarding the peak location.

What I'm not getting is the large DC component. Are you sure you are generating a sine wave as the input? Does the input go negative? For a sinewave, the DC should be close to zero provided you get enough cycles.

我乃一代侩神 2024-07-13 22:23:17

另一种途径是为您正在寻找的每个音符中心频率制定一个Goertzel 算法

一旦您获得了一种有效的算法实现,您就可以使其使用参数来设置其中心频率。 这样您就可以轻松运行其中的 88 个或集合中您需要的任何内容并扫描峰值。

Goertzel 算法基本上是单箱 FFT。 使用这种方法,您可以按照音符的自然走向对数放置垃圾箱。

来自维基百科的一些伪代码:

s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
  s = x[n] + coeff*s_prev - s_prev2;
  s_prev2 = s_prev;
  s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;

代表前两个样本的两个变量被保留用于下一次迭代。 然后可以在流应用程序中使用它。 我认为也许功率计算也应该在循环内。 (但是 Wiki 文章中并未这样描述。)

在音调检测情况下,将有 88 个不同的系数、88 对先前样本,并会产生 88 个功率输出样本,指示该频率仓中的相对电平。

Another avenue is to craft a Goertzel's Algorithm of each note center frequency you are looking for.

Once you get one implementation of the algorithm working you can make it such that it takes parameters to set it's center frequency. With that you could easily run 88 of them or what ever you need in a collection and scan for the peak value.

The Goertzel Algorithm is basically a single bin FFT. Using this method you can place your bins logarithmically as musical notes naturally go.

Some pseudo code from Wikipedia:

s_prev = 0
s_prev2 = 0
coeff = 2*cos(2*PI*normalized_frequency);
for each sample, x[n],
  s = x[n] + coeff*s_prev - s_prev2;
  s_prev2 = s_prev;
  s_prev = s;
end
power = s_prev2*s_prev2 + s_prev*s_prev - coeff*s_prev2*s_prev;

The two variables representing the previous two samples are maintained for the next iteration. This can be then used in a streaming application. I thinks perhaps the power calculation should be inside the loop as well. (However it is not depicted as such in the Wiki article.)

In the tone detection case there would be 88 different coeficients, 88 pairs of previous samples and would result in 88 power output samples indicating the relative level in that frequency bin.

冷月断魂刀 2024-07-13 22:23:17

WaveyDavey 说,他通过计算机的音频硬件从麦克风捕获声音,但他的结果不是以零为中心的。 这听起来像是硬件问题。 它应该以零为中心。

当房间安静时,来自声音 API 的值流应非常接近 0 幅度,环境噪声略有+/-变化。 如果房间中存在振动声音(例如钢琴、长笛、声音),则数据流应显示基本上基于正弦波的波,该波有正有负,且平均值接近于零。 如果情况并非如此,那么系统就会出现一些问题!

-瑞克

WaveyDavey says that he's capturing sound from a mic, thru the audio hardware of his computer, BUT that his results are not zero-centered. This sounds like a problem with the hardware. It SHOULD BE zero-centered.

When the room is quiet, the stream of values coming from the sound API should be very close to 0 amplitude, with slight +- variations for ambient noise. If a vibratory sound is present in the room (e.g. a piano, a flute, a voice) the data stream should show a fundamentally sinusoidal-based wave that goes both positive and negative, and averages near zero. If this is not the case, the system has some funk going on!

-Rick

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文