如何使用傅里叶变换从 WAV 文件中提取半精确频率

发布于 2024-09-02 08:34:00 字数 155 浏览 6 评论 0原文

假设我有一个 WAV 文件。在此文件中,是一系列精确的 1 秒间隔的正弦音。我想使用 FFTW 库按顺序提取这些音调。这是不是特别难做到?我该怎么办呢?

另外,将这种音调写入 WAV 文件的最佳方法是什么?我假设我只需要一个简单的音频库来输出。

我选择的语言是 C

Let us say that I have a WAV file. In this file, is a series of sine tones at precise 1 second intervals. I want to use the FFTW library to extract these tones in sequence. Is this particularly hard to do? How would I go about this?

Also, what is the best way to write tones of this kind into a WAV file? I assume I would only need a simple audio library for the output.

My language of choice is C

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

晚风撩人 2024-09-09 08:34:00

要获取文件某个部分的功率谱

  • 收集 N 个样本,其中 N是 2 的幂 - 例如,如果您的采样率为 44.1 kHz,并且您想要大约每秒采样一次,则假设 N = 32768 个样本。

  • 对样本应用合适的窗口函数,例如Hanning

  • 传递将样本窗口化到 FFT 例程 - 理想情况下,您想要一个实数到复数的 FFT,但如果您全部有一个复数到复数的 FFT,然后为所有虚数输入部分传递 0

  • 计算您的 FFT 输出箱 (re * re + im * im)

  • (可选)计算每个幅值平方输出的 10 * log10 bin 获取幅度值,单位为 dB

现在您已经有了您需要的功率谱识别峰值,如果您有合理的信噪比,这应该非常简单。请注意,N 越大,频率分辨率越高。对于上述 44.1 kHz 采样率和 N = 32768 的示例,每个 bin 的频率分辨率为 44100 / 32768 = 1.35 Hz。

To get the power spectrum of a section of your file:

  • collect N samples, where N is a power of 2 - if your sample rate is 44.1 kHz for example and you want to sample approx every second then go for say N = 32768 samples.

  • apply a suitable window function to the samples, e.g. Hanning

  • pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts

  • calculate the squared magnitude of your FFT output bins (re * re + im * im)

  • (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB

Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.

赤濁 2024-09-09 08:34:00

您基本上对估计频谱感兴趣 - 假设您已经过了阅读阶段WAV 并将其转换为离散时间信号。

在各种方法中,最基本的是周期图,它相当于采用加窗离散傅里叶变换(使用 FFT)并保持其平方幅度。这与保罗的回答相对应。您需要一个跨越您想要检测的最低频率的多个周期的窗口。示例:如果您的正弦波可以低至 10 Hz(周期 = 100 毫秒),则您应该采用 200 毫秒或 300 毫秒左右(或更长)的窗口。然而,周期图有一些缺点,尽管它计算起来很简单,而且如果高的话就足够了不要求精度:

原始周期图不太好
由于光谱的光谱估计
偏差和方差这一事实
在给定频率下不减少
作为中使用的样本数量
计算量增加。

通过对多个窗口进行平均并明智地选择宽度,周期图可以表现得更好(Bartlet 方法)。还有许多其他方法可以估计频谱(AR 建模)。

实际上,您并不是对估计整个频谱感兴趣,而只是对单个频率的位置感兴趣。这可以通过寻找估计频谱的峰值(按照解释完成)来完成,但也可以通过更具体和强大(且复杂)方法(Pisarenko,MUSIC 算法)。对于你的情况来说,他们可能会太过分了。

You are basically interested in estimating a Spectrum -assuming you've already gone past the stage of reading the WAV and converting it into a discrete time signal.

Among the various methods, the most basic is the Periodogram, which amounts to taking a windowed Discrete Fourier Transform (with a FFT) and keeping its squared magnitude. This correspond to Paul's answer. You need a window which spans over several periods of the lowest frequency you want to detect. Example: if your sinusoids can be as low as 10 Hz (period = 100ms), you should take a window of 200ms o 300ms or so (or more). However, the periodogram has some disadvantages, though it's simple to compute and it's more than enough if high precision is not required:

The raw periodogram is not a good
spectral estimate because of spectral
bias and the fact that the variance
at a given frequency does not decrease
as the number of samples used in the
computation increases.

The periodogram can perform better by averaging several windows, with a judious choosing of the widths (Bartlet method). And there are many other methods for estimating the spectrum (AR modelling).

Actually, you are not exactly interested in estimating a full spectrum, but only the location of a single frequency. This can be done seeking a peak of an estimated spectrum (done as explained), but also by more specific and powerful (and complicated) methods (Pisarenko, MUSIC algorithm). They would probably be overkill in your case.

夜夜流光相皎洁 2024-09-09 08:34:00

WAV 文件包含线性脉冲编码调制 (LPCM) 数据。这仅仅意味着它是固定采样率的幅度值序列。文件开头包含 RIFF 标头,用于传达诸如 采样率 和每个样本的位数(例如 8 kHz 有符号 16 位)。

格式非常简单,您可以轻松地推出自己的格式。但是,有几个库可以加快该过程,例如 libsndfile简单直接媒体层 (SDL)/SDL_mixerPortAudio 是两个不错的播放库。

至于将数据输入 FFTW,您需要缓冲 1 秒的块(通过采样率和每个样本的位数确定大小)。然后将所有样本转换为 IEEE 浮点型(即 floatdouble,具体取决于 FFTW 配置 -libsndfile 可以为你做到这一点)。接下来创建另一个数组来保存频域输出。最后,通过将两个缓冲区传递到 fftw_plan_dft_r2c_1d 并调用 fftw_execute 以及返回的 fftw_plan 句柄

WAV files contain linear pulse code modulated (LPCM) data. That just means that it is a sequence of amplitude values at a fixed sample rate. A RIFF header is contained at the beginning of the file to convey information like sampling rate and bits per sample (e.g. 8 kHz signed 16-bit).

The format is very simple and you could easily roll your own. However, there are several libraries available to speed the process such as libsndfile. Simple Direct-media Layer (SDL)/SDL_mixer and PortAudio are two nice libraries for playback.

As for feeding the data into FFTW, you would need to buffer 1 second chunks (determine size by the sample rate and bits per sample). Then convert all of the samples to IEEE floating-point (i.e. float or double depending on the FFTW configuration--libsndfile can do this for you). Next create another array to hold the frequency domain output. Finally, create and execute an FFTW plan by passing both buffers to fftw_plan_dft_r2c_1d and calling fftw_execute with the returned fftw_plan handle.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文