使用Python测量音频“响度”
我希望使用 Python 计算一段音频的响度 - 可能通过提取一段音频的峰值音量,或者可能使用更准确的测量(RMS?)。
最好的方法是什么?我看过pyaudio,但这似乎没有做什么我想要。看起来不错的是ruby-audio,因为它看起来有声音.abs.max 内置于其中。
输入音频将从持续时间约为 30 秒的各种本地 MP3 文件中获取。
I'm looking to calculate the loudness of a piece of audio using Python — probably by extracting the peak volume of a piece of audio, or possibly using a more accurate measure (RMS?).
What's the best way to do this? I've had a look at pyaudio, but that didn't seem to do what I wanted. What looked good was ruby-audio, as this seemingly has sound.abs.max
built into it.
The input audio will be taken from various local MP3 files that are around 30s in duration.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为 RMS 将是最准确的测量方法。需要注意的一件事是,我们对不同频率的响度感知不同,因此使用 fft 将音频转换为频率空间(numpy.fft 应该只适用于 30 秒的音频)。现在据此计算功率谱密度。使用响度曲线按频率对 PSD 进行加权。特别是低于10Hz的频率,因为那里会有很大的功率(它会主导时域中的RMS计算),但我们听不到它。现在积分 PSD 并取平方根,这将给出感知的 RMS。
您还可以将 mp3 分成多个部分或窗口,并应用此技术来给出特定部分的音量。
I think that the RMS would be the the most accurate measure. One thing to note is that we percieve loudness differently at different frequencies, so convert the audio to frequency space with an fft (numpy.fft should work great on only 30s of audio). Now compute a power spectral density from this. Weight the PSD by frequency using some loudness curve. Especially frequencies below 10Hz, since there will be a lot of power there (it would dominate the RMS calculation in the time-domain), yet we can't hear it. Now integrate the PSD and take the square root and that will give a percieved RMS.
You can also break the mp3 into sections or windows and apply this technique to give the volume in particular sections.