在 WAV 文件上使用 jTransforms 库?
我正在尝试使用 jTransforms 库对 WAV 文件进行频谱分析:官方网站
但我对如何使用 jTransforms 将 WAV 文件转换为 FFT 可接受的输入以及如何在 FFT 后显示频谱有疑问?我在 Google 上进行了搜索,发现我需要以某种方式将 WAV 文件转换为 double[]
或 Complex[]
,然后我应该如何解释输出?
抱歉,我对 FFT 很陌生,所以这个问题听起来非常愚蠢。非常感谢!
I am trying to do spectral analysis on a WAV file using the jTransforms library: Official Site
But I have problems on how to convert the WAV file into an acceptable input for FFT using jTransforms, and how can I display a frequency spectrum after FFT? I have searched around Google and found I need to somehow convert the WAV file into a double[]
or Complex[]
, and afterwards how should I interpret the output?
Sorry I am very new to FFT so this question may sound extremely stupid. Many thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道你的图书馆,但我想他们有关于如何应用转换的大量文档。
关于解释,如果您使用复数变换,您可以将实部解释为相应频率仓的能量,将虚部解释为正弦曲线的相位。
功率谱密度 (PSD) 可以通过
等于
(因此将实部乘以其复共轭)来计算。
您可能需要考虑的一件事是重新缩放您的 fft 输出。一些算法按与 fftSize 成比例的方式缩放输出。因此,您必须将输出乘以 1/fftSize。
最后一件事是,如果您不知道的话,您只需采用 fft 输出的一半,因为频谱是对称的。
中间 bin (fftSize/2) 通常是镜像基频,等于 fftData[0]。这标志着奈奎斯特频率,这是您可以使用给定 fftSize 分析的最高频率。
因此,如果您想显示高达 22kHz 的频率,请确保您的 fftSize 至少为 44k。
FFT 有很多陷阱,因此请务必阅读某些部分并了解您在其中做什么。如果您只想使用数学,那么数学本身并不那么重要,因此您可以跳过它们。
编辑:还有更多。考虑使用锥形窗口(高斯、汉明、汉宁...)对输入数据进行加权,以避免在不将整个 wav 文件作为输入时出现令人讨厌的边缘效应。否则,您的 fft 输出中将会出现人为的高频,而这些高频在原始信号中根本不存在。
I don't know your library but i guess they have extensive documentation on how to apply the transforms.
Regarding the interpretation, if you use a complex transform you can interpret the real part as energy for the corresponding freuqncy bin and the imaginary as phase of the sinusoid.
The power spectral density (PSD) can be computed by
which is equal to
(so multiply the real parts by their complex conjugate).
One thing you might have to consider is rescaling your fft output. Some algorithms scale the output proportional to the fftSize. So you will have to multiply the output by 1/fftSize.
And the last thing in case you are not aware of, you only have to take half of the fft output since the spectrum is symmetric.
The middle bin (fftSize/2) is usually the mirrored fundamental frequency and is equal to fftData[0]. This marks the Nyquist frequency which is the highest frequency you can analyze with the given fftSize.
So if you want to display frequencies upto 22kHz make sure your fftSize is at least 44k.
There are many pitfalls with FFT, so be sure you read up on some parts and understand what you are doing there. The mathematics itself are not that important if you just want to use it, so you might skip them.
EDIT: There is even more. Consider to weight your input data with a tapered window (gaussian, hamming, hanning...) to avoid nasty edge effects if you don't feed your whole wav file as input. Otherwise you will get artificial high frequencies into your fft output which are simply not present in the original.