给我解释一下FFT

发布于 2024-08-01 22:28:39 字数 284 浏览 4 评论 0原文

我想获取音频 PCM 数据并找到其中的峰值。具体来说，我想返回峰值出现的频率和时间。

我对此的理解是，我必须获取 PCM 数据并将其转储到一个数组中，将其设置为实际值，并将复杂部分设置为 0。然后我进行 FFT，然后得到一个数组。如果数组中的每个数字都是一个幅度值，我如何获得与每个数字相关的频率？另外，我是否考虑真实的大小？复杂的部分或只是丢弃复杂的值？

最后，如果我想找到一首歌曲中的峰值，我是否只需为 FFT 设置一个小窗口并将其滑过所有音频？关于该窗口应该有多大有什么建议吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花开雨落又逢春i 2024-08-08 22:28:39

如果 PCM 数据的采样率为 F，则 FFT 中的最高频率分量为 F/2。假设您的 PCM 数据以 44100Hz 采样，那么您的 FFT 值将从 0Hz (DC) 运行到 22050Hz。如果您从 N 个样本开始（N 是 2 的幂），则 FFT 可能会返回代表从 0 到 F/2 的所有正频率的 N/2 个值，或者可能会返回还包括以下负频率的 N 个值： F/2 到 0。您应该检查 FFT 算法的规范，以找出每个数组项映射到的频率。

要找到峰值，您需要查看 FFT 值的幅度。因此，您需要将每个复数值的实部和虚部的平方相加。

假设 N 个 PCM 样本的 FFT 返回代表正频率的 N/2 个复数值。那么 2 个复数样本之间的距离是 F/2N Hz。当 F=44100Hz 且 N=1024 个样本时，这将是 21.5Hz。这是您的频率分辨率。如果您需要找到较低频率的节拍，则需要扩展 FFT 窗口。

回复收藏 0 原文

别靠近我心 2024-08-08 22:28:39

出色地，
大小为 512 的复数原始数组表示输入波，当使用 FFT 处理时，我们将用零替换虚部（根据预期用途），留下实部，然后将数组传递给 FFT，采样率为：8192赫兹。

现在我们有一个 512 个 FFT 实数数组，每个值都是一个无理数，每个无理数表示几个有用的值。

为了获得基频，我们必须将采样率除以缓冲区大小：

8192/512 = 32；

32 是 FFT 值的分辨率，意味着我们正在了解 32 倍数数字附近的高振幅频率。

就像我们有一个

频率波：3 48 23 128
幅度：10 5 12 8 dB (ref = 1)

FFT 后我们得到：

频率：0 32 64 128
幅度：9 8 2 8

FFT是频域意味着它按照频率排列
另一边的时域是指按照我们听音乐的时间从零秒到秒N排列。FFT

只有按照频率排列从频率0到频率N才能听。

所以它按照升序排列频率，因为它没有' t 从音频中获取所有实际样本（接近无限），就像获取每一纳秒和每纳秒一样。与 FFT 相比，幸运的是，这种情况不会发生 FFT 从音频中获取样本，每（1/采样率）秒获取一次样本。该样本被缓冲（在我们的例子中：512），每个 512 个样本被缓冲到 FFT 中，输出是 512 个 FFT 值。

由于 FFT 排列频率，它会扰乱时间样本，样本现在根据频率排列。

常规基础上显示的频率是基频，即采样率除以缓冲区大小，在我们的例子中为 8192/512 = 32。

因此，每 32 个频率显示一次频率功率，最近频率的功率根据下式显示电源频率与指标接近多少。

通过使用更高的采样率可以实现高分辨率。

为了显示频率，我们以与幅度相对应的升序打印索引。

振幅 = 20log10(输出/参考)

每个索引旁边打印的振幅显示频率和频率的功率。根据分辨率的精度获得更准确的结果。

结论是，FFT产生一个幅度指数，每个幅度表示其对应指数（频率）的功率。

well,
A raw array of size 512 of complex numbers expressing the input wave, when processed with FFT we will replace the imaginary parts with zero (according to intended use), leaving the real parts, then pass the array to the FFT with Sample rate : 8192 Hz.

Now we have a 512 array of FFTed real values, each value is an irrational number, every irrational number express several useful values.

To get the fundamental frequency we have to divide the sample rate by the buffer size:

8192/512 = 32;

32 is the resolution of the FFT values means that we're getting to know the high amplitude frequencies near the numbers that are multiples of 32.

Like if we have a wave of

frequency : 3 48 23 128
Amplitude : 10 5 12 8 dB (ref = 1)

after FFT we get:

frequency : 0 32 64 128
Amplitude : 9 8 2 8

FFT is frequency domain means it arranges according to frequency
Time-domain on the other side means arranging by time we listen to music from second zero to second N.

FFT can only listen when it arranged by Frequency from frequency 0 to frequency N.

So it arranges frequencies in ascending order, since it didn't take all the actual samples from the audio (which are approaching infinite) like taking every nanosecond & less to the FFT, luckily this doesn't happen FFT takes samples from the audio, takes a sample every (1/sample rate) second. this samples get buffered (in our case : 512), each 512 samples buffered into FFT, the output is 512 FFT values.

Since FFT arranges frequencies, it messes with the time samples, samples now arranged according to their frequencies.

The frequencies shown on regular base which is the fundamental frequency which is sample rate divided by buffer size, which is in our case 8192/512 = 32.

So, frequencies power shown every 32 frequencies, the power of the nearest frequency is shown according to how much the power frequency is near to the index.

High resolution can be achieved by using higher sample rate.

To show frequencies we print the index in ascending corresponding to the Amplitude.

Amplitude = 20log10(output/ref)

Amplitudes printed next to each Index show the power of the frequency & get more accurate according to the precision of the resolution.

Conclusion, FFT produces an index of amplitudes, each amplitude expresses the power of its corresponding index (frequency).

回复收藏 0 原文

橘和柠 2024-08-08 22:28:39

您实际上可能正在寻找频谱图，它基本上是一个在沿时间轴滑动的小窗口中对数据进行 FFT。如果您有实现此功能的软件，它可能会节省您一些精力。它通常用于分析随时间变化的声学信号，并且是一种非常有用的观察声音的方法。此外，还有一些技巧，例如，对于 FFT 的加窗数据，频谱图可能会正确，但对您来说更难（尽管不是很难）正确完成。

回复收藏 0 原文

~没有更多了~