使用快速傅里叶变换分析音频
我正在尝试用 python 创建图形频谱分析仪。
我当前正在读取 16 位双通道 44,100 Hz 采样率音频流的 1024 字节,并对 2 个通道的幅度进行平均。 现在我有 256 条签名短裤。 我现在想使用 numpy 之类的模块对该数组执行 fft,并使用结果创建图形频谱分析仪,该分析仪一开始只有 32 个条。
我已阅读有关快速傅里叶变换和离散傅里叶变换的维基百科文章,但我仍然不清楚结果数组代表什么。 这就是我使用 numpy 对数组执行 fft 后的数组:
[ -3.37260500e+05 +0.00000000e+00j 7.11787022e+05 +1.70667403e+04j
4.10040193e+05 +3.28653370e+05j 9.90933073e+04 +1.60555003e+05j
2.28787050e+05 +3.24141951e+05j 2.09781047e+04 +2.31063376e+05j
-2.15941453e+05 +1.63773851e+05j -7.07833051e+04 +1.52467334e+05j
-1.37440802e+05 +6.28107674e+04j -7.07536614e+03 +5.55634993e+03j
-4.31009964e+04 -1.74891657e+05j 1.39384348e+05 +1.95956947e+04j
1.73613033e+05 +1.16883207e+05j 1.15610357e+05 -2.62619884e+04j
-2.05469722e+05 +1.71343186e+05j -1.56779748e+04 +1.51258101e+05j
-2.08639913e+05 +6.07372799e+04j -2.90623668e+05 -2.79550838e+05j
-1.68112214e+05 +4.47877871e+04j -1.21289916e+03 +1.18397979e+05j
-1.55779104e+05 +5.06852464e+04j 1.95309737e+05 +1.93876325e+04j
-2.80400414e+05 +6.90079265e+04j 1.25892113e+04 -1.39293422e+05j
3.10709174e+04 -1.35248953e+05j 1.31003438e+05 +1.90799303e+05j...
我想知道这些数字到底代表什么,以及如何将这些数字转换为 32 个条形中每个条形的高度百分比。 另外,我应该将 2 个通道一起平均吗?
I am trying to create a graphical spectrum analyzer in python.
I am currently reading 1024 bytes of a 16 bit dual channel 44,100 Hz sample rate audio stream and averaging the amplitude of the 2 channels together. So now I have an array of 256 signed shorts. I now want to preform a fft on that array, using a module like numpy, and use the result to create the graphical spectrum analyzer, which, to start will just be 32 bars.
I have read the wikipedia articles on Fast Fourier Transform and Discrete Fourier Transform but I am still unclear of what the resulting array represents. This is what the array looks like after I preform an fft on my array using numpy:
[ -3.37260500e+05 +0.00000000e+00j 7.11787022e+05 +1.70667403e+04j
4.10040193e+05 +3.28653370e+05j 9.90933073e+04 +1.60555003e+05j
2.28787050e+05 +3.24141951e+05j 2.09781047e+04 +2.31063376e+05j
-2.15941453e+05 +1.63773851e+05j -7.07833051e+04 +1.52467334e+05j
-1.37440802e+05 +6.28107674e+04j -7.07536614e+03 +5.55634993e+03j
-4.31009964e+04 -1.74891657e+05j 1.39384348e+05 +1.95956947e+04j
1.73613033e+05 +1.16883207e+05j 1.15610357e+05 -2.62619884e+04j
-2.05469722e+05 +1.71343186e+05j -1.56779748e+04 +1.51258101e+05j
-2.08639913e+05 +6.07372799e+04j -2.90623668e+05 -2.79550838e+05j
-1.68112214e+05 +4.47877871e+04j -1.21289916e+03 +1.18397979e+05j
-1.55779104e+05 +5.06852464e+04j 1.95309737e+05 +1.93876325e+04j
-2.80400414e+05 +6.90079265e+04j 1.25892113e+04 -1.39293422e+05j
3.10709174e+04 -1.35248953e+05j 1.31003438e+05 +1.90799303e+05j...
I am wondering what exactly these numbers represent and how I would convert these numbers into a percentage of a height for each of the 32 bars. Also, should I be averaging the 2 channels together?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您显示的数组是音频信号的傅里叶变换系数。 这些系数可用于获取音频的频率内容。 FFT 是为复数值输入函数定义的,因此即使您的输入都是实数值,您得到的系数也将是虚数。 为了获得每个频率的功率量,您需要计算每个频率的 FFT 系数的幅度。 这不仅仅是系数的实部,您需要计算其实部和虚部的平方和的平方根。 也就是说,如果你的系数是 a + b*j,那么它的大小就是 sqrt(a^2 + b^2)。
计算出每个 FFT 系数的幅度后,您需要弄清楚每个 FFT 系数属于哪个音频频率。 N 点 FFT 将为您提供从 0 开始的 N 个等间隔频率处的信号的频率内容。因为您的采样频率是 44100 个样本/秒。 FFT 中的点数为 256,频率间隔为 44100 / 256 = 172 Hz(大约)
数组中的第一个系数将是 0 频率系数。 这基本上是所有频率的平均功率水平。 其余系数将以 172 Hz 的倍数从 0 开始计数,直到达到 128。在 FFT 中,您最多只能测量采样点一半的频率。 阅读有关 NyquistFrequency 和 Nyquist-Shannon Sampling Theorem 如果您是贪吃者需要知道为什么,但基本结果是你的较低频率将被复制或 在较高频率的桶中出现别名。 因此,频率将从 0 开始,每个系数增加 172 Hz,直到 N/2 系数,然后减少 172 Hz,直到 N - 1 系数。
这些信息应该足以帮助您入门。 如果您想要比 Wikipedia 上提供的更平易近人的 FFT 介绍,您可以尝试 了解数字信号处理:第二版。 这对我很有帮助。
这就是这些数字所代表的意义。 可以通过将每个频率分量幅度乘以所有分量幅度的总和来转换为高度的百分比。 尽管如此,这只会给你一个相对频率分布的表示,而不是每个频率的实际功率。 您可以尝试按频率分量可能的最大幅度进行缩放,但我不确定这是否会显示得很好。 找到可行的缩放因子的最快方法是对响亮和柔和的音频信号进行实验,以找到正确的设置。
最后,如果您想整体显示整个音频信号的频率内容,则应该对两个通道进行平均。 您正在将立体声音频混合为单声道音频并显示组合频率。 如果您想要左右频率有两个单独的显示,那么您将需要分别对每个通道执行傅里叶变换。
The array you are showing is the Fourier Transform coefficients of the audio signal. These coefficients can be used to get the frequency content of the audio. The FFT is defined for complex valued input functions, so the coefficients you get out will be imaginary numbers even though your input is all real values. In order to get the amount of power in each frequency, you need to calculate the magnitude of the FFT coefficient for each frequency. This is not just the real component of the coefficient, you need to calculate the square root of the sum of the square of its real and imaginary components. That is, if your coefficient is a + b*j, then its magnitude is sqrt(a^2 + b^2).
Once you have calculated the magnitude of each FFT coefficient, you need to figure out which audio frequency each FFT coefficient belongs to. An N point FFT will give you the frequency content of your signal at N equally spaced frequencies, starting at 0. Because your sampling frequency is 44100 samples / sec. and the number of points in your FFT is 256, your frequency spacing is 44100 / 256 = 172 Hz (approximately)
The first coefficient in your array will be the 0 frequency coefficient. That is basically the average power level for all frequencies. The rest of your coefficients will count up from 0 in multiples of 172 Hz until you get to 128. In an FFT, you only can measure frequencies up to half your sample points. Read these links on the Nyquist Frequency and Nyquist-Shannon Sampling Theorem if you are a glutton for punishment and need to know why, but the basic result is that your lower frequencies are going to be replicated or aliased in the higher frequency buckets. So the frequencies will start from 0, increase by 172 Hz for each coefficient up to the N/2 coefficient, then decrease by 172 Hz until the N - 1 coefficient.
That should be enough information to get you started. If you would like a much more approachable introduction to FFTs than is given on Wikipedia, you could try Understanding Digital Signal Processing: 2nd Ed.. It was very helpful for me.
So that is what those numbers represent. Converting to a percentage of height could be done by scaling each frequency component magnitude by the sum of all component magnitudes. Although, that would only give you a representation of the relative frequency distribution, and not the actual power for each frequency. You could try scaling by the maximum magnitude possible for a frequency component, but I'm not sure that that would display very well. The quickest way to find a workable scaling factor would be to experiment on loud and soft audio signals to find the right setting.
Finally, you should be averaging the two channels together if you want to show the frequency content of the entire audio signal as a whole. You are mixing the stereo audio into mono audio and showing the combined frequencies. If you want two separate displays for right and left frequencies, then you will need to perform the Fourier Transform on each channel separately.
虽然这个帖子已经有很多年了,但我发现它非常有帮助。 我只是想向任何发现此内容并尝试创建类似内容的人提供我的意见。
至于条形的划分,不应该像 antti 建议的那样,根据条形的数量平均划分数据。 最有用的是将数据划分为八度音阶部分,每个八度音阶的频率是前一个八度音阶的频率的两倍。 (即100hz 是50hz 以上一个八度,50hz 是25hz 以上一个八度)。
根据您想要的小节数量,您可以将整个范围划分为 1/X 八度范围。
根据条形图上 A 的给定中心频率,您可以通过以下方式获得条形图的上限和下限:
要计算下一个相邻的中心频率,您可以使用类似的计算:
然后对适合这些范围的数据进行平均,以获得每个条的幅度。
例如:
我们想要划分为 1/3 倍频程范围,并从 1khz 的中心频率开始。
给定 44100hz 和 1024 个样本(每个数据点之间为 43hz),我们应该对 21 到 26 的值进行平均。(890.9 / 43 = 20.72 ~ 21 和 1122.5 / 43 = 26.10 ~ 26 )
(1/3 倍频程小节将为您提供大约 30 个小节介于~40hz 和~20khz 之间)。
正如您现在可以看出的,随着我们走得更高,我们将对更大范围的数字进行平均。 低条形通常仅包含 1 个或少量数据点。 而较高的条可以是数百个点的平均值。 原因是 86hz 比 43hz 高一个八度……而 10086hz 听起来几乎与 10043hz 相同。
Although this thread is years old, I found it very helpful. I just wanted to give my input to anyone who finds this and are trying to create something similar.
As for the division into bars this should not be done as antti suggest, by dividing the data equally based on the number of bars. The most useful would be to divide the data into octave parts, each octave being double the frequency of the previous. (ie. 100hz is one octave above 50hz, which is one octave above 25hz).
Depending on how many bars you want, you divide the whole range into 1/X octave ranges.
Based on a given center frequency of A on the bar, you get the upper and lower limits of the bar from:
To calculate the next adjoining center frequency you use a similar calculation:
You then average the data that fits into these ranges to get the amplitude for each bar.
For example:
We want to divide into 1/3 octaves ranges and we start with a center frequency of 1khz.
Given 44100hz and 1024 samples (43hz between each data point) we should average out values 21 through 26. ( 890.9 / 43 = 20.72 ~ 21 and 1122.5 / 43 = 26.10 ~ 26 )
(1/3 octave bars would get you around 30 bars between ~40hz and ~20khz).
As you can figure out by now, as we go higher we will average a larger range of numbers. Low bars typically only include 1 or a small number of data points. While the higher bars can be the average of hundreds of points. The reason being that 86hz is an octave above 43hz... while 10086hz sounds almost the same as 10043hz.
您拥有的是一个时间长度为 256/44100 = 0.00580499 秒的样本。 这意味着您的频率分辨率为 1 / 0.00580499 = 172 Hz。 从 Python 中得到的 256 个值基本上对应于从 86 Hz 到 255*172+86 Hz = 43946 Hz 的频率。 您得到的数字是复数(因此每个第二个数字末尾都有“j”)。
编辑:修复错误信息
您需要通过计算 sqrt(i2 + j2) 将复数转换为幅度,其中 i 和 j 是实部和虚部,分别。
如果您想要 32 个条形图,据我所知,您应该取四个连续振幅的平均值,得到您想要的 256 / 4 = 32 个条形图。
what you have is a sample whose length in time is 256/44100 = 0.00580499 seconds. This means that your frequency resolution is 1 / 0.00580499 = 172 Hz. The 256 values you get out from Python correspond to the frequencies, basically, from 86 Hz to 255*172+86 Hz = 43946 Hz. The numbers you get out are complex numbers (hence the "j" at the end of every second number).
EDITED: FIXED WRONG INFORMATION
You need to convert the complex numbers into amplitude by calculating the sqrt(i2 + j2) where i and j are the real and imaginary parts, resp.
If you want to have 32 bars, you should as far as I understand take the average of four successive amplitudes, getting 256 / 4 = 32 bars as you want.
FFT 返回 N 个复数值,你们可以计算出 module=sqrt(real_part^2+imaginary_part^2)。 要获得每个频段的值,您必须对频段内所有谐波的模块求和。 下面您可以看到有关 10 条频谱分析仪的示例。 必须包装 c 代码才能获得 pyd python 模块。
我用Python设计并制作了一个完整的10 LED条形频谱分析仪。 为了代替使用 nunmpy 库(太大且无用,无法仅获取 FFT),创建了一个 python pyd 模块(仅 27KB)来获取 FFT 并将整个音频频谱分割为频段。
此外,为了读取输出音频,创建了一个环回 WASapi portaudio pyd 模块。 您可以在图中看到该项目(框图)
10BarsSpectrumAnalyzerWithWASapi.jpg
刚刚在我的 YouTube 频道上添加了一个教程视频:如何设计和制作非常智能的 Python 频谱分析仪 10 Led Bar
FFT return N complex values which of you can compute the
module=sqrt(real_part^2+imaginary_part^2)
. To get the value for each band you have to sum the modules about all harmonics inside the band. Below you can see an example about a 10 bars spectrum analyzer. The c code has to be wrapped to get a pyd python module.I designed and made a whole 10 led bar spectrum analyzer by Python. Instead to use the nunmpy library (too big and useless to get just the FFT) a python pyd module (just 27KB) to get the FFT and to split the entire audio spectrum to bands was created.
In addition, to read the output audio a loopback WASapi portaudio pyd module was created. You can see the project (block diagram) in the image
10BarsSpectrumAnalyzerWithWASapi.jpg
Just added a tutorial video on my YouTube channel: how to design and make a very smart Python Spectrum Analyzer 10 Led Bar