将 FFT 转换为频谱图

发布于 2024-08-10 07:35:22 字数 1156 浏览 12 评论 0原文

我有一个音频文件，我正在迭代该文件并在每一步采集 512 个样本，然后将它们传递给 FFT。

我将数据输出为块 514 浮点数（使用 IPP 的 ippsFFTFwd_RToCCS_32f_I），实部和虚部交错。

我的问题是，一旦我有了这些复数，我该如何处理它们？目前我正在为每个值做

const float realValue   = buffer[(y * 2) + 0];
const float imagValue   = buffer[(y * 2) + 1];
const float value       = sqrt( (realValue * realValue) + (imagValue * imagValue) );

这提供了一些稍微有用的东西，但我宁愿通过某种方式将值放在 0 到 1 的范围内。上面的问题是峰值最终会返回到 9 左右或更多。这意味着事情变得严重饱和，然后频谱图的其他部分几乎没有显示出来，尽管当我通过试镜的频谱图运行音频时它们似乎相当强大。我完全承认我不能 100% 确定 FFT 返回的数据是什么（除了它代表我传入的 512 个样本长块的频率值）。特别是我对复数到底代表什么缺乏理解。

任何建议和帮助将不胜感激！

编辑：只是为了澄清。我的大问题是，如果不知道比例是多少，返回的 FFT 值毫无意义。有人可以指点我制定这个规模吗？

Edit2：通过执行以下操作，我得到了非常漂亮的结果：

size_t count2   = 0;
size_t max2     = kFFTSize + 2;
while( count2 < max2 )
{
    const float realValue   = buffer[(count2) + 0];
    const float imagValue   = buffer[(count2) + 1];
    const float value   = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
    buffer[count2 >> 1] = value;
    count2 += 2;
}

在我看来，这甚至比我看过的大多数其他频谱图实现更好。

我所做的事情有什么严重错误吗？

原文

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.

I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.

My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value

const float realValue   = buffer[(y * 2) + 0];
const float imagValue   = buffer[(y * 2) + 1];
const float value       = sqrt( (realValue * realValue) + (imagValue * imagValue) );

This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.

Any advice and help would be much appreciated!

Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?

Edit2: I get really nice looking results by doing the following:

size_t count2   = 0;
size_t max2     = kFFTSize + 2;
while( count2 < max2 )
{
    const float realValue   = buffer[(count2) + 0];
    const float imagValue   = buffer[(count2) + 1];
    const float value   = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
    buffer[count2 >> 1] = value;
    count2 += 2;
}

To my eye this even looks better than most other spectrogram implementations I have looked at.

Is there anything MAJORLY wrong with what I'm doing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

捎一片雪花 2024-08-17 07:35:22

要使所有 FFT 可见，通常要做的事情是取幅度的对数。

因此，输出缓冲区的位置告诉您检测到的频率。复数的幅度（L2 范数）告诉您检测到的频率有多强，而相位（反正切）为您提供的信息在图像空间中比音频空间中重要得多。由于 FFT 是离散的，因此频率从 0 到奈奎斯特频率。在图像中，第一项 (DC) 通常是最大的，因此如果这是您的目标，那么它是用于标准化的良好候选者。我不知道音频是否也是如此（我对此表示怀疑）

回复收藏 0 原文

离旧人 2024-08-17 07:35:22

对于 512 个样本的每个窗口，您可以像以前那样计算 FFT 的幅度。每个值代表信号中存在的相应频率的幅度。

mag
 /\
 |
 |      !         !
 |      !    !    !
 +--!---!----!----!---!--> freq
 0          Fs/2      Fs

现在我们需要找出频率。

由于输入信号为实数值，FFT 围绕中间（奈奎斯特分量）对称，第一项为直流分量。已知信号采样频率Fs，奈奎斯特频率为Fs/2。因此，对于索引k，对应的频率为k*Fs/512

因此，对于长度为512的每个窗口，我们得到指定频率下的幅度。连续窗口上的那些组形成了频谱图。

For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.

mag
 /\
 |
 |      !         !
 |      !    !    !
 +--!---!----!----!---!--> freq
 0          Fs/2      Fs

Now we need to figure out the frequencies.

Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512

So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.

回复收藏 0 原文

懷念過去 2024-08-17 07:35:22

只是为了让人们知道我在整个问题上做了很多工作。我发现的主要事情是 FFT 在完成后需要归一化。

为此，您需要对窗口向量的所有值进行平均，以获得略小于 1 的值（如果使用矩形窗口，则为 1）。然后将该数字除以 FFT 变换后的频率仓数。

最后，将 FFT 返回的实际数除以归一化数。您的振幅值现在应该在 -Inf 到 1 的范围内。日志等，如您所愿。您仍将使用已知范围。

回复收藏 0 原文

心房敞 2024-08-17 07:35:22

我认为有一些事情会对您有所帮助。

前向 FT 往往会在输出中给出比输入中更大的数字。您可以将其视为特定频率的所有强度都显示在一个位置，而不是分布在数据集中。这有关系吗？可能不是，因为您始终可以扩展数据以满足您的需求。我曾经编写过一个基于整数的 FFT/IFFT 对，并且每次传递都需要重新缩放以防止整数溢出。

您输入的真实数据被转换为几乎复杂的数据。事实证明 buffer[0] 和 buffer[n/2] 是真实且独立的。此处对此进行了很好的讨论。

输入数据是随时间变化的、等间隔的声音强度值。据说它们是在时域中的，足够恰当了。 FT 的输出被称为频域，因为水平轴是频率。垂直尺度仍然是强度。尽管从输入数据来看并不明显，但输入中也存在相位信息。尽管所有声音都是正弦波，但没有任何东西可以固定正弦波的相位。该相位信息在频域中显示为各个复数的相位，但我们通常不关心它（而且我们也经常这样做！）。这仅取决于您在做什么。该计算

const float value = sqrt((realValue * realValue) + (imagValue * imagValue));

检索强度信息但丢弃相位信息。取对数本质上只是抑制了大峰值。

希望这有帮助。

There are a few things that I think you will find helpful.

The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.

The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.

The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation

const float value = sqrt((realValue * realValue) + (imagValue * imagValue));

retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.

Hope this is helpful.

回复收藏 0 原文

我的影子我的梦 2024-08-17 07:35:22

如果您得到奇怪的结果，那么需要检查的一件事是 FFT 库的文档，以了解输出是如何打包的。某些例程使用实数/虚数交错的打包格式，或者它们可能从 N/2 元素开始并环绕。

为了进行完整性检查，我建议创建具有已知特征的样本数据，例如 Fs/2、Fs/4（Fs = 样本频率），并将 FFT 例程的输出与您的期望进行比较。尝试以相同的频率创建正弦和余弦，因为它们在频谱中应该具有相同的幅度，但具有不同的相位（即 realValue/imagValue 会不同，但平方和应该相同。

如果您是如果您打算使用 FFT，那么您确实需要知道它在数学上是如何工作的，否则您可能会遇到其他奇怪的问题，例如混叠。

回复收藏 0 原文

~没有更多了~