合勤 ADPCM 编解码器
我有一个 ZyXEL USB Omni56K Duo 调制解调器,想要在其上发送和接收语音流,但为了达到足够的质量,我可能需要实现一些“ZyXEL ADPCM”编码,因为普通 PCM 提供的采样率太小,甚至无法传输中等质量的语音,而且它也不能通过 USB 工作(可能是因为即使这个比特率对于其中的 USB 串行转换器来说也太高了)。
这个神秘的编解码器出现在所有 Microsoft WAV 相关库中,作为理论上它支持的众多编解码器之一,但我没有找到任何实现。
有人可以提供任何语言的实现或一些文档吗?编写自定义 mu-law 解码算法对我来说不是问题。
谢谢。
I have a ZyXEL USB Omni56K Duo modem and want to send and receive voice streams on it, but to reach adequate quality I probably need to implement some "ZyXEL ADPCM" encoding because plain PCM provides too small sampling rate to transmit even medium quality voice, and it doesn't work through USB either (probably because even this bitrate is too high for USB-Serial converter in it).
This mysterious codec figures in all Microsoft WAV-related libraries as one of many codecs theoretically supported by it, but I found no implementations.
Can someone offer an implementation in any language or maybe some documentation? Writing a custom mu-law decoding algorithm won't be a problem for me.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不确定 ZyXEL ADPCM 与其他类型的 ADPCM 有何不同,但通过一些 google 搜索可以找到各种 ADPCM 实现。
然而,我发表这篇文章的真正原因是为什么选择 ADPCM。 ADPCM 是自适应差分脉冲编码调制。这意味着传递的数据是样本之间的差异,而不是当前值(这也是您看到如此大压缩的原因)。在没有位丢失的干净环境(即磁盘驱动器)中,这很好。然而,在流环境中,通常假设比特可能会被周期性地破坏。对数据的任何一点损坏,您都会很快听到静电或其他音频伪影,而且通常情况下相当严重。
ADPCM 的重置机制不是基于帧的,这意味着音频问题可能会持续很长一段时间,具体取决于编码器。重置代码通常是一组 0(我想到的是 16,但自从我编写自己的端口以来已经有很多年了)。
电话环境中的 ADPCM 通常将 12 位 PCM 样本转换为 4 位 ADPCM 样本(不错)。至于音频质量......对于电话交谈和口语来说还不错,但大多数人在盲测中可以轻松检测到质量下降。
在你的最后一句话中,你在问题中抛出了一个曲线球。你开始提到 muLaw。 muLaw 是一种 PCM 实现,它采用 12 位样本并使用对数标度将其转换为 8 位样本。这是北美 TDM(电话)网络的典型压缩机制(世界其他大部分地区使用称为 ALaw 的类似算法)。
所以,我很困惑你实际上想找到什么。
您还提到了 Microsft 和 WAV 实现。您可能知道,但以防万一,WAV 只是音频数据的包装,提供格式、采样信息、通道、大小和其他有用信息。如果不涉及 WAV、AU 或其他包装器,muLaw 和 ADPCM 通常会以原始数据的形式呈现。
如果您正在实施 ADPCM,还有一个提示。正如我所指出的,它们使用 4 位来表示 12 位样本。他们通过双方都有一个乘数表来逃避这一点。您在表中的位置根据 4 位值而变化(换句话说,该值既是步长的倍数,又用于计算新的步长)。我见过各种算法使用略有不同的表(不知道为什么,但您通常会看到发送和接收的信号慢慢偏离偏差)。一种较旧的、流行的声音包与我通常从电话硬件供应商那里看到的不同。
而且,对于更多无用的琐事,ADPCM 有多种风格。差异涉及表格、源样本大小和目标样本大小,但我从未需要使用它们。只是记录了我在互联网搜索电话中使用的各种音频格式的规范时发现的风格。
I'm not sure how ZyXEL ADPCM varies from other flavors of ADPCM, but various ADPCM implementations can be found with some google searches.
However, the real reason for my post is why the choice of ADPCM. ADPCM is adaptive differential pulse-code modulation. This means that the data being passed is the difference in samples, not the current value (which is also why you see such great compression). In a clean environment with no bit loss (ie disk drive), this is fine. However, in a streaming environment, its generally assumed that bits may be periodically mangled. Any bit damage to the data and you'll be hearing static or other audio artifacts very quickly and usually, fairly badly.
ADPCM's reset mechanism isn't framed based, which means the audio problems can go on for an extended period of time depending on the encoder. The reset code is a usually a set of 0s (16 comes to mind, but its been years since I wrote my own ports).
ADPCM in the telephony environment usually converts a 12 bit PCM sample to a 4 bit ADPCM sample (not bad). As for audio quality...not bad for phone conversations and the spoken word, but most people, in a blind test, can easily detect the quality drop.
In your last sentence, you throw a curve ball into the question. You start mentioning muLaw. muLaw is a PCM implementation that takes a 12 bit sample and transforms it using a logarithmic scale to an 8 bit sample. This is the typical compression mechanism for TDM (phone) networkworks in North America (most of the rest of the world uses a similar algorithm called ALaw).
So, I'm confused what you are actually trying to find.
You also mentioned Microsft and WAV implementations. You probably know, but just in case, that WAV is just a wrapper around the audio data that provides format, sampling information, channel, size and other useful information. Without WAV, AU or other wrappers involved, muLaw and ADPCM are usually presented as raw data.
One other tip if you are implementing ADPCM. As I indicated, they use 4 bits to represent a 12 bit sample. They get away with this by both sides having a multiplier table. Your position in the table changes based on the 4 bit value (in other words, the value is both multiple against a step size and used to figure out the new step size). I've seen a variety of algorithms use slightly different tables (no idea why, but you typically see the sent and received signals slowly stray off the bias). One of the older, popular sound packages was different than what I typically saw from the telephony hardware vendors.
And, for more useless trivia, there are multiple flavors of ADPCM. The variances involve the table, source sample size and destination sample size, but I've never had a need to work with them. Just documented flavors that I've found when I did my internet search for specifications for the various audio formats used in telephony.
通过 ffmpeg -f u16le -i - -f wav -acodec adpcm_ms - 管道传输 pcm 可能会起作用。
http://ffmpeg.org/
Piping your pcm through
ffmpeg -f u16le -i - -f wav -acodec adpcm_ms -
will likely work.http://ffmpeg.org/