最佳语音压缩算法/格式
我们有一些原始语音音频需要通过互联网分发。 我们需要像样的品质,但不需要音乐品质。 我们主要关心的是消费者的可用性(即他们可以玩什么游戏以及在哪里玩)以及下载的大小。 我的经验表明,mp3 不能产生最佳的语音音频压缩数据,但我不知道最好的替代方案是什么。 最终,我们希望实现转换过程的自动化,以允许消费者选择他们想要的质量与尺寸级别。
We have some raw voice audio that we need to distribute over the internet. We need decent quality, but it doesn't need to be of musical quality. Our main concern is usability by the consumer (i.e. what and where they can play it) and size of the download. My experience has shown that mp3s do not produce the best compression numbers for voice audio, but I am at a loss for what the best alternatives are. Ultimately we would like to automate the conversion process to allow the consumer to choose the quality vs. size level that they would like.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您应该尝试一下Opus。 压缩命令行示例:
You should give Opus a try. Example compression command line:
从这里开始。
正如您正确指出的那样,语音压缩不同于一般的音频压缩。 您会发现许多专用于电话应用程序的编解码器,从 PCM 和 ADPCM 到后来的基于数据包的编码(例如 GSM 蜂窝网络上使用的 CELP)。
不过,由于所使用的媒介不同,VOIP 语音编码与语音编码略有不同。 您可以在 Speex 中找到一个优秀的免费(无阻碍且开源 (BSD))语音编码/解码库软件库。
同样,您选择的方式取决于您正在编码的语音及其传输的介质。 另请注意,许多库都有多种可以根据情况使用的算法,有些甚至会根据声音和网络的条件即时切换。
要获得更多帮助,请缩小问题范围。
-亚当
Start here.
As you rightly point out, voice compression is different from general audio compression. You'll find many codecs dedicated to telephony applications, ranging from PCM and ADPCM through later packet based encodings such as CELP used on GSM cellular networks.
Still, VOIP voice encoding is slightly different from that due to the medium used. you can find a good, free (unencumbered and open source (BSD)) library for speech encoding/decoding in the Speex software library.
Again, which you choose depends on the speech you're encoding and the medium it's being transmitted over. Also note that many libraries have several algorithms they can use depending on the circumstances, and some will even switch on the fly based on conditions of the sound and network.
To get more help, narrow your question down.
-Adam
现场语音音频(如 VoIP 电话)中最常用的压缩格式是 μ-Law(美国使用 mu-Law/u-Law)和 a-Law(欧洲等使用),与未压缩格式不同PCM,不支持如此宽的频率范围(较小范围的可能值会忽略必要频谱之外的声音,并且需要较少的存储空间)。
出于可用性考虑,最简单的方法是使用 mpeg 压缩 (mp2/3/4) 流式传输到标准媒体播放器,因为算法很容易获得,通常速度相当快,几乎所有媒体播放器都应该支持它,但对于语音,您可能会尝试指定较低的比特率或首先从较低质量的文件进行转换(WAV 可以采用多种采样率,而语音需要比音乐或效果低得多的采样率,它基本上就像视频上的每秒帧数)。 或者,您可以使用 Real Media、WMA 或其他专有格式,但这会限制可用性,因为用户需要特定的第三方软件才能播放,尽管 WMA 具有出色的压缩比以及特定于语音音频的压缩选项。
The most frequently used compression formats used in live voice audio (like VoIP telephony) are μ-Law (mu-Law/u-Law is used in the US) and a-Law (used in Europe, etc.) which, unlike Uncompressed PCM, don't support as wide of a frequency range (a smaller range of possible values ignores sounds outside of the necessary spectrum and requires less space to store).
For usability sake it is easiest to use mpeg compressions (mp2/3/4) for streaming to standard media players as the algorithms are readily available and typically quite fast and almost all media players should support it, but for voice you might try to specify a lower bitrate or do your conversion from a lower quality file in the first place (WAV can be at several sampling rates and voice requires a much lower sampling rate than music or effects, it's basically like frame-per-second on video). Alternatively you can use Real Media, WMA or other proprietary formats, but this would limit usability since the users would require specific third party software for playback, though WMA has an excellent compression ratio as well as compression options specific to voice audio.
假设您的用户将运行 Windows,则可以将 WMA 语音压缩编解码器与 Windows Media Encoder SDK 一起使用。 如果做不到这一点,您可以使用 ACM 来使用 G723/G728、ADPCM、mu-law 或 a-law 等,其中一些在 Windows XP 和 Windows 上作为标准安装。 多于。 这些可以打包在 WAV 文件中。 您需要进行一些实验才能找到正确的比特率/质量(可能不必担心 mu-law 或 a-law)。 对于语音数据,您可以使用相当低的采样率 - 例如 16000 或 8000,因为人类口语中没有太多高于 4Khz 的采样率。
Assuming your users will be running Windows, there is a WMA speech compression codec that you can use with the Windows Media Encoder SDK. Failing that, you can use ACM to use something like G723/G728, ADPCM, mu-law or a-law, some of which are installed as standard on Windows XP & above. These can be packaged inside WAV files. You'll need to experiment a little to find the right bitrate/quality (probably don't bother with mu-law or a-law). With voice data you can get away with quite low sample rates - e.g. 16000 or 8000, as there isn't much above 4Khz in the human spoken voice.
我认为 AMR 是最好的语音编解码器之一。 我大约一年前使用过它,我记得质量非常好,而且尺寸相当小。
一个缺点,特别是在您的情况下,据我所知,它不受各种媒体播放器的支持。 我知道 QuickTime 和 RealPlayer 是两个可以播放 .amr 文件的软件。
I think AMR is one of the best speech codecs. I was using it about a year ago and I remember that quality was very good and size levels were rather small.
One drawback, especially in your case is that, as far as I know, it isn't supported by wide range of media players. QuickTime and RealPlayer are two which I know to play .amr files.
尝试 speex ...不受专利限制,在大小和 CPU 方面都有良好的性能。 我在 iPhone 上使用它一直很幸运。
Try speex ... unencumbered by patents, good performance both sizewise and CPU-wise. I've been having good luck using it on iPhone.