我正在尝试使用 java 的声音 API 获取 wav 文件的音量级别，但无法弄清楚

发布于 2024-10-17 03:18:45 字数 208 浏览 6 评论 0原文

我将处理大量音频文件，我不需要播放这些文件，但我希望能够定期（例如每秒）获取音量级别，这样我可以粗略地绘制出整个文件的总体音量水平。我已经使用 java Sound API 来读取文件的帧，但我不确定如何解释它们（我不确定如何处理小端以及将帧分成两个通道），我尝试将 AudioInputStream 发送到 SourceDataLine，并每秒在数据线上调用 getLevel()，但它总是返回 0。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟花易冷人易散 2024-10-24 03:18:45

如果您已签名 PCM，则从字节解码的值将具有 Java 短整数范围（-32768 到 32767）。然而，声波的本质是来回扫描，因此在任何单帧中，该值几乎可以在任何地方，因此与您听到的音量没有特别好的相关性。

因此，我怀疑您将不得不查看许多样本并进行某种总体分析。也许将所有与 0 的偏差作为绝对值相加，然后除以帧数？需要多少帧？好吧，如果我们想要包含以 50 cps 循环的低音声波，并且想要确保包含整个周期，则相当于一整秒帧的 1/50。如果采样率为 44100fps，则为 8805 帧！但也许使用这个滚动平均值会扭曲其他附近频率值的贡献？

警告，我主要是自学的，所以可能有更好的方法来做到这一点。

以下是我用来将小端轨道的帧转换为 -1 和 1 之间的浮点数（呃，0.999...）的代码行，其中 buffer 是一个字节数组：

float audioVal  = (float)( ( ( buffer[i+1] << 8 )   
    | ( buffer[i] & 0xff ) ) / 32768.0 );

如果您搜索，还有其他帖子Stack Overflow 上有类似的转换。 MSB 被移动并保留其符号。 LSB 使用 ff 十六进制进行 & 运算，以确保“符号位”被解释为数值的一部分。 MSB 和LSB 组合在一起并除以最大可能的短整型值以“标准化”范围。我认为小端 16 位编码的帧的四个字节的顺序如下：b[0] = 左 LSB，b[1] = 左 MSB，b[2] = 右 LSB，b[3] = 右MSB。我不记得在哪里看到过这个官方定义或发布的。如果把左边和右边调换了就很尴尬了。正确的！

在找到移动平均线之前，您可能需要进行 ABS。也许可以将 ABS 内置到转换中以节省几个 CPU。

If you have signed PCM, the values decoded from the bytes will have the range of a Java short integer (-32768 to 32767). The nature of a sound wave, though, is to sweep back and forth, so at any single frame, the value could be almost anywhere and thus not particularly well correlated with the volume you hear.

So, I suspect you will have to look at many samples and do some sort of analysis in the aggregate. Perhaps add up all the deviations from 0 as absolute values and divide by the number of frames? How many frames would be needed? Well, if we want to include, say, bass sound waves that cycle at 50 cps, and want to make sure we include an entire cycle, that's 1/50 of a full second's worth of frames. If you sample rate is 44100fps, that's 8805 frames! But perhaps using this rolling average distorts the contributions of other nearby frequency values?

Caveat, I'm mostly self-taught, so there may be a better way to do this.

Following is the code line I use to convert a frame of a little-endian track to a float between -1 and 1 (er, 0.999...), where buffer is a byte array:

float audioVal  = (float)( ( ( buffer[i+1] << 8 )   
    | ( buffer[i] & 0xff ) ) / 32768.0 );

If you search, there are other posts that have similar conversions, here on Stack Overflow. The MSB is shifted and retains its sign. The LSB is &'d with an ff hexadecimal to make sure the "sign bit" is interpreted as being part of the numeric value. The MSB & LSB are |'d together and divided by the largest possible short int value to "normalize" the range. I think the four bytes of a frame for little endian 16-bit encoding are ordered as follows: b[0] = left LSB, b[1] = left MSB, b[2] = right LSB, b[3] = right MSB. I can't recall where I saw this officially defined or posted though. It would be embarressing to have swapped the left & right!

You would want to do an ABS before finding the moving average. Maybe the ABS could be built into the conversion to save a couple cpus.

回复收藏 0 原文

~没有更多了~