可视化 PCM 样本的体积
我的 C++ 应用程序中有几块 PCM 音频 (G.711)。我想可视化每个块中的不同音频音量。
我的第一次尝试是计算每个块的样本值的平均值并将其用作体积指标,但这效果不佳。对于具有静音的块,我确实得到 0,对于具有音频的块,我得到不同的值,但这些值仅略有不同,并且似乎与实际音量并不相似。
计算体积的更好算法是什么?
我听说G.711音频是对数PCM。我应该如何考虑这一点?
I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.
My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.
What would be a better algorithem calculate the volume ?
I hear G.711 audio is logarithmic PCM. How should I take that into account ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
请注意,我自己没有使用过 G.711 PCM 音频,但我认为您在处理这些值之前正在执行从编码幅度到实际幅度的正确转换。
当声音波形在零的两侧振荡时,您会期望大多数样本的平均值大约为零。
粗略的体积计算为 rms(均方根),即取样本平方的滚动平均值并取该平均值的平方根。当有声音时,这会给你一个正值;该数量与波形中表示的功率有关。
对于与人类对音量的感知更好相关的东西,您可能需要研究 中使用的技术类型重播增益。
Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.
You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.
A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.
For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.
如果您雄心勃勃,可以下载 G.711 从国际电联网站上获取,并在接下来的几周(或更长时间)内实施它。
如果您比这更懒(或更明智),您可以 下载 G.191 - 它包含压缩和解压缩 G.711 编码数据的源代码。
一旦你解码了它,可视化体积应该会容易得多。
If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.
If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.
Once you've decoded it, visualizing the volume should be a whole lot easier.