来自字节数组的卷

发布于 2024-10-06 19:49:15 字数 227 浏览 9 评论 0原文

我是音频分析新手,但需要执行一项(看似)简单的任务。我有一个包含 16 位录音(单通道)且采样率为 44100 的字节数组。如何执行快速分析以获取任何给定时刻的音量?我需要计算一个阈值,因此如果它高于某个幅度(音量),则函数返回 true,否则返回 false。我以为我可以迭代字节数组并检查它的值,其中 255 是最响亮的,但这似乎不起作用,因为即使我不记录任何内容,背景噪音也会进入,并且一些数组充满了255. 任何建议都会很好。 谢谢

I'm new to audio analysis, but need to perform a (seemingly) simple task. I have a byte array containing a 16 bit recording (single channel) and a sample rate of 44100. How do I perform a quick analysis to get the volume at any given moment? I need to calculate a threshold, so a function to return true if it's above a certain amplitude (volume) and false if not. I thought I could iterate through the byte array and check its value, with 255 being the loudest, but this doesn't seem to work as even when I don't record anything, background noise gets in and some of the array is filled with 255. Any suggestions would be great.
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情感失落者 2024-10-13 19:49:15

由于您有 16 位数据,因此信号应在 -32768 和 +32767 之间变化。
要计算体积,您可以每隔 1000 个样本进行一次,并计算它们的 RMS 值。将样本值的平方和除以 1000 并取平方根。根据您的阈值检查此数字。

As you have 16-bit data, you should expect the signal to vary between -32768 and +32767.
To calculate the volume you can take intervals of say 1000 samples, and calculate their RMS value. Sum the squared sample values divide by 1000 and take the square root. check this number against you threshold.

追风人 2024-10-13 19:49:15

通常,人们使用均方根来测量波的能量。

如果您想在感知上更加准确,可以通过离散傅里叶变换获取时域信号到频域信号,并使用某种加权函数对幅度进行积分(因为低频波在感知上比相同能量下的高频波响亮)。

但我也不知道音频的东西,所以我只是编造一些东西。 ☺

Typically one measures the energy of waves using root mean square.

If you want to be more perceptually accurate you can take the time-domain signal through a discrete fourier transform to a frequency-domain signal, and integrate over the magnitudes with some weighting function (since low-frequency waves are perceptually louder than high-frequency waves at the same energy).

But I don't know audio stuff either so I'm just making stuff up. ☺

盛装女皇 2024-10-13 19:49:15

我可能会尝试应用标准差滑动窗口。 OTOH,我不会假设 255 = 最响亮。可能是,但我想知道正在使用什么编码。如果存在任何压缩,那么我怀疑 255 是“最响亮的”。

I might try applying a standard-deviation sliding-window. OTOH, I would not have assumed that 255 = loudest. It may be, but I'd want to know what encoding is being used. If any compression is present, then I doubt 255 is "loudest."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文