给我解释一下FFT

发布于 2024-08-01 22:28:39 字数 284 浏览 4 评论 0原文

我想获取音频 PCM 数据并找到其中的峰值。 具体来说,我想返回峰值出现的频率和时间。

我对此的理解是,我必须获取 PCM 数据并将其转储到一个数组中,将其设置为实际值,并将复杂部分设置为 0。然后我进行 FFT,然后得到一个数组。 如果数组中的每个数字都是一个幅度值,我如何获得与每个数字相关的频率? 另外,我是否考虑真实的大小? 复杂的部分或只是丢弃复杂的值?

最后,如果我想找到一首歌曲中的峰值,我是否只需为 FFT 设置一个小窗口并将其滑过所有音频? 关于该窗口应该有多大有什么建议吗?

I want to take audio PCM data and find peaks in it. Specifically, I want to return the frequency and time at which a peak occurs.

My understanding of this is that I have to take the PCM data and dump it into an array, setting it as the real values with the complex parts set to 0. I then take the FFT, and I get an array back. If each number in the array is a magnitude value, how do I get the frequency associated with each one? Also, do I take the magnitude of the real & complex part or just discard the complex values?

Finally, if I wanted to find the peaks in a single song, do I just set a small window to FFT and slide it across all of the audio? Any suggestions on how large that window should be?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花开雨落又逢春i 2024-08-08 22:28:39

如果 PCM 数据的采样率为 F,则 FFT 中的最高频率分量为 F/2。 假设您的 PCM 数据以 44100Hz 采样,那么您的 FFT 值将从 0Hz (DC) 运行到 22050Hz。 如果您从 N 个样本开始(N 是 2 的幂),则 FFT 可能会返回代表从 0 到 F/2 的所有正频率的 N/2 个值,或者可能会返回还包括以下负频率的 N 个值: F/2 到 0。您应该检查 FFT 算法的规范,以找出每个数组项映射到的频率。

要找到峰值,您需要查看 FFT 值的幅度。 因此,您需要将每个复数值的实部和虚部的平方相加。

假设 N 个 PCM 样本的 FFT 返回代表正频率的 N/2 个复数值。 那么 2 个复数样本之间的距离是 F/2N Hz。 当 F=44100Hz 且 N=1024 个样本时,这将是 21.5Hz。 这是您的频率分辨率。 如果您需要找到较低频率的节拍,则需要扩展 FFT 窗口。

If the samplerate of your PCM data is F, then the highest frequency component in the FFT is F/2. Suppose your PCM data was sampled at 44100Hz, then your FFT values will run from 0Hz (DC) to 22050Hz. If you start with N samples, (N being a power of 2), then the FFT may return N/2 values representing all positive frequencies from 0 to F/2, or it may return N values that also include the negative frequencies from -F/2 to 0. You should check the specification of your FFT algorithm to find out to which frequency each array item is mapped.

To find the peaks, you need to look at the magnitude of the FFT values. So you need to add the squared real and imaginary parts of each complex value.

Suppose your FFT of N PCM samples returns N/2 complex values representing positive frequencies. Then the distance between 2 complex samples is F/2N Hz. With F=44100Hz and N=1024 samples, this would be 21.5Hz. This is your frequency resolution. If you need to find lower frequency beats, the FFT window will need to be extended.

别靠近我心 2024-08-08 22:28:39

出色地,
大小为 512 的复数原始数组表示输入波,当使用 FFT 处理时,我们将用零替换虚部(根据预期用途),留下实部,然后将数组传递给 FFT,采样率为:8192赫兹。

现在我们有一个 512 个 FFT 实数数组,每个值都是一个无理数,每个无理数表示几个有用的值。

为了获得基频,我们必须将采样率除以缓冲区大小:

8192/512 = 32;

32 是 FFT 值的分辨率,意味着我们正在了解 32 倍数数字附近的高振幅频率。

就像我们有一个

频率波:3 48 23 128
幅度:10 5 12 8 dB (ref = 1)

FFT 后我们得到:

频率:0 32 64 128
幅度:9 8 2 8

FFT是频域意味着它按照频率排列
另一边的时域是指按照我们听音乐的时间从零秒到秒N排列。FFT

只有按照频率排列从频率0到频率N才能听。

所以它按照升序排列频率,因为它没有' t 从音频中获取所有实际样本(接近无限),就像获取每一纳秒和每纳秒一样。 与 FFT 相比,幸运的是,这种情况不会发生 FFT 从音频中获取样本,每(1/采样率)秒获取一次样本。 该样本被缓冲(在我们的例子中:512),每个 512 个样本被缓冲到 FFT 中,输出是 512 个 FFT 值。

由于 FFT 排列频率,它会扰乱时间样本,样本现在根据频率排列。

常规基础上显示的频率是基频,即采样率除以缓冲区大小,在我们的例子中为 8192/512 = 32。

因此,每 32 个频率显示一次频率功率,最近频率的功率根据下式显示电源频率与指标接近多少。

通过使用更高的采样率可以实现高分辨率。

为了显示频率,我们以与幅度相对应的升序打印索引。

振幅 = 20log10(输出/参考)

每个索引旁边打印的振幅显示频率和频率的功率。 根据分辨率的精度获得更准确的结果。

结论是,FFT产生一个幅度指数,每个幅度表示其对应指数(频率)的功率。

well,
A raw array of size 512 of complex numbers expressing the input wave, when processed with FFT we will replace the imaginary parts with zero (according to intended use), leaving the real parts, then pass the array to the FFT with Sample rate : 8192 Hz.

Now we have a 512 array of FFTed real values, each value is an irrational number, every irrational number express several useful values.

To get the fundamental frequency we have to divide the sample rate by the buffer size:

8192/512 = 32;

32 is the resolution of the FFT values means that we're getting to know the high amplitude frequencies near the numbers that are multiples of 32.

Like if we have a wave of

frequency : 3 48 23 128
Amplitude : 10 5 12 8 dB (ref = 1)

after FFT we get:

frequency : 0 32 64 128
Amplitude : 9 8 2 8

FFT is frequency domain means it arranges according to frequency
Time-domain on the other side means arranging by time we listen to music from second zero to second N.

FFT can only listen when it arranged by Frequency from frequency 0 to frequency N.

So it arranges frequencies in ascending order, since it didn't take all the actual samples from the audio (which are approaching infinite) like taking every nanosecond & less to the FFT, luckily this doesn't happen FFT takes samples from the audio, takes a sample every (1/sample rate) second. this samples get buffered (in our case : 512), each 512 samples buffered into FFT, the output is 512 FFT values.

Since FFT arranges frequencies, it messes with the time samples, samples now arranged according to their frequencies.

The frequencies shown on regular base which is the fundamental frequency which is sample rate divided by buffer size, which is in our case 8192/512 = 32.

So, frequencies power shown every 32 frequencies, the power of the nearest frequency is shown according to how much the power frequency is near to the index.

High resolution can be achieved by using higher sample rate.

To show frequencies we print the index in ascending corresponding to the Amplitude.

Amplitude = 20log10(output/ref)

Amplitudes printed next to each Index show the power of the frequency & get more accurate according to the precision of the resolution.

Conclusion, FFT produces an index of amplitudes, each amplitude expresses the power of its corresponding index (frequency).

橘和柠 2024-08-08 22:28:39

您实际上可能正在寻找频谱图,它基本上是一个在沿时间轴滑动的小窗口中对数据进行 FFT。 如果您有实现此功能的软件,它可能会节省您一些精力。 它通常用于分析随时间变化的声学信号,并且是一种非常有用的观察声音的方法。 此外,还有一些技巧,例如,对于 FFT 的加窗数据,频谱图可能会正确,但对您来说更难(尽管不是很难)正确完成。

You may actually be looking for a spectrogram, which is basically an FFT of the data in a small window that's slid along the time axis. If you have software that implements this, it might save you some effort. It's what's commonly used for analysing time varying acoustic signals, and is a very useful way to look at sounds. Also, there are some tricks, for example, with windowing data for FFTs, that the spectrogram will probably get right, but will be harder (though not very hard) for you to do correctly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文