将 PCM 字节数组转换为环绕声通道

发布于 2024-10-30 22:26:28 字数 1536 浏览 9 评论 0原文

据我了解，我使用的音频字节数组（PCM 立体声 16 位）每个样本 4 个字节。我注意到，当您反转字节值（即 -128 到 128 和 128 到 -128）时，它不会将声音放入环绕声道中。听起来是一样的（前置音频）。我尝试反转每隔一个字节（每 2 个字节）而不是所有字节，并得到类似环绕声的效果，但它非常肮脏且不稳定。我到底如何操作常规 PCM 16 位立体声 WAV 文件（字节数组形式）以便将音频放置在环绕声道中？

我的代码：

public byte[] putInSurround(byte[] audio) {
        for (int i = 0; i < audio.length; i += 4) {
            int i0 = audio[i + 0];
            int i1 = audio[i + 1];
            int i2 = audio[i + 2];
            int i3 = audio[i + 3];
            if (0 > audio[i + 0]) {
                i0 = Math.abs(audio[i + 0]);
            }
            if (0 < audio[i + 0]) {
                i0 = 0 - audio[i + 0];
            }
            if (0 > audio[i + 1]) {
                i1 = Math.abs(audio[i + 1]);
            }
            if (0 < audio[i + 1]) {
                i1 = 0 - audio[i + 1];
            }
            if (0 > audio[i + 2]) {
                i2 = Math.abs(audio[i + 2]);
            }
            if (0 < audio[i + 2]) {
                i2 = 0 - audio[i + 2];
            }
            if (0 > audio[i + 3]) {
                i3 = Math.abs(audio[i + 3]);
            }
            if (0 < audio[i + 3]) {
                i3 = 0 - audio[i + 3];
            }
            audio[i + 0] = (byte) i0;
            //audio[i + 1] = (byte) i1; <-- Commented Out For Every Other Byte.
            //audio[i + 2] = (byte) i2; <-- Commented Out For Every Other Byte.
            audio[i + 3] = (byte) i3;
        }
        return audio;
    }

原文

As I understand, the audio byte array that I am using (PCM Stereo 16bit) is 4 bytes per sample. I noticed that when you invert the Byte value (ie. -128 to 128 and 128 to -128) it does not put the sound in the surround channel. It sounds the same (front audio). I experimented with inverting every other byte (every 2 bytes) rather than all of the bytes and got something like surround sound, but it's very dirty and choppy. How exactly do I manipulate a regular PCM 16bit Stereo WAV file (in byte array form) so that the audio is placed in the surround channels?

My Code:

public byte[] putInSurround(byte[] audio) {
        for (int i = 0; i < audio.length; i += 4) {
            int i0 = audio[i + 0];
            int i1 = audio[i + 1];
            int i2 = audio[i + 2];
            int i3 = audio[i + 3];
            if (0 > audio[i + 0]) {
                i0 = Math.abs(audio[i + 0]);
            }
            if (0 < audio[i + 0]) {
                i0 = 0 - audio[i + 0];
            }
            if (0 > audio[i + 1]) {
                i1 = Math.abs(audio[i + 1]);
            }
            if (0 < audio[i + 1]) {
                i1 = 0 - audio[i + 1];
            }
            if (0 > audio[i + 2]) {
                i2 = Math.abs(audio[i + 2]);
            }
            if (0 < audio[i + 2]) {
                i2 = 0 - audio[i + 2];
            }
            if (0 > audio[i + 3]) {
                i3 = Math.abs(audio[i + 3]);
            }
            if (0 < audio[i + 3]) {
                i3 = 0 - audio[i + 3];
            }
            audio[i + 0] = (byte) i0;
            //audio[i + 1] = (byte) i1; <-- Commented Out For Every Other Byte.
            //audio[i + 2] = (byte) i2; <-- Commented Out For Every Other Byte.
            audio[i + 3] = (byte) i3;
        }
        return audio;
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

波浪屿的海角声 2024-11-06 22:26:28

我在任何方面都不是 DSP 方面的专家，但我有一些可能会有所帮助的观察结果：

您以 4 字节的增量解析数组，这正确对应于单个 16 位立体声示例：2 个通道 * 16 位 = 32 位 = 4 个字节。
现在，我可能不明白您想要做什么，但在现代环绕音频中，环绕声道通常相互独立。这意味着每个环绕音频样本需要超过 4 个字节。例如，如果您有 5 个通道，则需要 10 个字节/样本，这可能意味着您的代码中需要单独的输入和输出数组。
有诸如杜比环绕和杜比定向逻辑，其中环绕声道被矩阵编码为两个立体声声道，但所涉及的 DSP 数学比您所拥有的复杂得多在你的代码中。更不用说需要特殊的解码器以及此类方法所隐含的质量损失。
反转 2 字节 样本中的每个字节是没有意义的：样本值 1000d 将变为 -744d。像这样的按位运算在 DSP 中很少使用（如果有的话）。
通常音频样本存储为带符号的 2 的补码二进制数。这使得按字节处理它们变得相当复杂，尤其是在没有无符号数字和指针转换的语言（例如 Java）中。您最好将字节数组转换为 short 或 int 数组 - 或者使用不同的编程语言，例如 C++。
反转 -128 产生 +128，它不能存储在 Java 使用的有符号字节中。
当“反转彼此字节”时，您存储 i + 0 和 i + 3 的逆，而不是 i + 0和 i + 2 或 i + 1 和 i + 3。
相互反转字节的结果虽然仍然没有任何意义，但会产生不同的效果，具体取决于您的音频表示是否为小端或大端。 RIFF WAV 文件使用小端字节顺序。
反转字节 0 和 2 会更改样本的 LSB，这只会在音频剪辑的动态范围有限时增加高振幅噪声和彻底失真。
反转字节 1 和 3 近似于以高幅度反转整个样本，并在动态范围有限的剪辑中添加大量失真。
反转整个样本，而不是单个字节，是近似 180 度相移。不过，我不确定你可以在哪里使用它...

如果你需要比这更多的帮助，你需要告诉我们你到底想做什么。您至少应该提及您的预期输出是什么以及您正在使用哪些 DSP 算法。

I am not in any manner, shape or form an expert in DSP, but I have a few observations that might be helpful:

You parse your array in increments of 4 bytes, which correctly corresponds to a single 16-bit stereo sound sample: 2 channels * 16 bits = 32 bits = 4 bytes.
Now, I may not understand what you are trying to do, but in modern surround audio, the surround channels are usually independent of each other. That means that you will need more than 4 bytes per surround audio sample. If, for example, you have 5 channels, you will need 10 bytes/sample, which probably means that you need separate input and output arrays in your code.
There are methods such as Dolby Surround and Dolby Pro Logic, where the surround channels are matrix-encoded into the two stereo channels, but the DSP mathematics involved is far more complex than what you have in your code. Not to mention the need for a special decoder and the quality loss implied by such methods.
Inverting each byte of a 2-byte sample makes no sense: A sample value of 1000d would become -744d. Bitwise operations like this are rarely used in DSP, if at all.
Usually audio samples are stored as signed 2's complement binary numbers. That makes handling them byte-wise quite complex, especially in a language with no unsigned numbers and no pointer casting such as Java. You would be better off converting the byte array into an array of short or int - or using a different programming language such as C++.
Inverting -128 produces +128, which cannot be stored in a signed Byte, as used by Java.
When "inverting each other byte", you store the inverse of i + 0 and i + 3, instead of i + 0 and i + 2 or i + 1 and i + 3.
The result of inverting each other byte, while still not making any sense, has a different effect, depending on whether your audio representation is little-endian or big-endian. RIFF WAV files use little-endian byte order.
Inverting bytes 0 and 2 changes the LSB of the samples, which would merely add noise in high amplitudes and outright distortion when the dynamic range of the audio clip is limited.
Inverting bytes 1 and 3 would approximate inverting the whole sample in high amplitudes and adding a lot of distortion in clips with limited dynamic range.
Inverting the whole sample, rather than individual bytes, is an approximation of a 180-degree phase-shift. I am not sure where you can use that, though...

You need to tell us what exactly you are trying to do, if you need more help than this. You should at least mention what is your expected output and which DSP algorithms you are using.

回复收藏 0 原文

~没有更多了~