是否有一个库可以完成 Levelator 对 .Net 所做的事情?
The Levelator 是一个程序,您可以输入一个音频文件,它会生成另一个具有更恒定音量的文件,以确保任何录音问题(例如一个人的声音太大或几乎听不见)都得到纠正。
您知道我可以在 Windows 中使用 .Net 执行相同任务的任何库吗?或者命令行程序也足够了。
The Levelator is a program that you feed an audio file and it generates another one with a more constant volume ensuring that any recording problems (like a person sounding too loud, or being barely audible) are corrected.
Do you know any libraries that I could use .Net in Windows to perform the same task? Or a command-line program would be good enough too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
事实上,Levelator 既不是压缩器也不是标准化器。是的,它标准化了,但它比你用袜子等能做的事情要多得多,也更聪明。把它想象成推子上的手,它提前知道会发生什么,甚至知道何时离开。一个人就够了。在此处查看算法讨论:http://www.conversationsnetwork.org/levelatorAlgorithm
...doug (Levelator 联合创始人)
In fact, the Levelator is neither a compressor nor a normalizer. Yes, it normalizes, but it does much more and has a lot more smarts than what you can do with sox, etc. Think of it as the hand on a fader that knows in advance what will happen and will even know when to leave well enough alone. Check out the algorithm discussion here: http://www.conversationsnetwork.org/levelatorAlgorithm
...doug (Levelator co-creator)
执行此操作的命令行程序是 sox。
该算法的总体思想是找到最高绝对值样本(无论采样数据的测量如何,音频都应居中)。
将最大可能值除以该数字(保证等于或小于),然后将其乘以所需的峰值水平(即,您希望它达到最大值的 0.95?满 1.0?)。如果结果不是一,则它成为您的比例值。然后,您迭代文件并将每个样本乘以该数字。
例如,对于 CD 质量音频,样本的最高可能绝对值是 32767(捏造这个以使示例更容易,实际范围是 -32768 到 32767,但将 32767 视为最大值会使事情变得更简单),所以如果您扫描一遍,你发现的最高绝对值是 18000,那么你的放大系数将为 1.8203888... ,如果你希望你的最大音量为 0.9887997070223*可用的最大值,这会给你一个新的比例系数 1.8。因此,您循环遍历保存音频文件的数组,并将每个样本的先前值替换为值 * 1.8。
这可以通过首先执行点击滤波器来优化,以消除杂散瞬态,还可以通过消除嘶声来优化,这可以通过消除扬声器无法产生或无法被扬声器听到的低频分量来确保波形均匀地集中在中值周围。人耳。点击过滤器是低通滤波器,而嘶嘶声消除器是高通滤波器。一旦运行这些滤波器,就会有更多的空间来放大声音,而不会引入失真。
A command line program that does this is sox.
The general idea with the algorithm is to find the highest absolute value sample (audio should be centered, whatever the measurement of the sampled data).
You divide your maximum possible value by this number (which is guaranteed to be equal or smaller), and then you multiply that by your desired peak level (ie. do you want it to reach .95 of max? full 1.0?). If the result is not one, it becomes your scale value. Then you iterate through your file and multiply every sample by that number.
For example with CD quality audio your highest possible absolute value for a sample is 32767 (fudging this to make the example easier, the real range is -32768 to 32767, but treating 32767 as your max makes things much simpler here), so if you scanned through and the highest absolute value you found was 18000, than your amplification factor will be 1.8203888... , and if you want your max volume to be 0.9887997070223*the max availible, that gives you a new scale factor of 1.8. So you loop through the array holding the audio file, and replace the previous value for each sample with the value * 1.8.
This can be optimized by doing a click filter first, to eliminate spurious transients, and also by de-essing, which makes sure the waveform is evenly centered around the median value by removing low frequency components that cannot be produced by speakers or heard by the human ear. The click filter is a lowpass, and the de-esser is a highpass. Once these filters are run, there will be more room for amplifying the sound without introducing distortion.
您正在寻找的技术称为音频标准化。此第三方代码,Mp3SoundCapture ,提供了一种方法来做到这一点,但它是一个单独的应用程序,而不是一个库。
The technique you're looking for is called audio normalization. This third-party code, Mp3SoundCapture, provides a way to do it, but it's a separate app, not a library.
有两种主要方法可以解决此问题:
归一化这仅涉及搜索音频中最响亮的部分,然后放大整个文件,以便最响亮的部分达到最大音量。仅当最大声部分的音量为 50% 或更低时,此技术才有用。如果您的输入文件中的某处有一个达到最大音量的尖峰,则归一化对您没有任何帮助。
压缩/限制这采用了稍微不同的方法,广泛用于音乐录制。基本思想是任何超过一定音量(称为“阈值”)的声音都会变得更安静(或者在限制器的情况下,不允许任何声音超过一定音量)。这具有使整个录音的音量变得均匀的效果(安静的部分保持不变,大声的部分变得更安静)。然后您就可以放大整个信号而不使其失真(这称为补偿增益)。有关详细信息,请参阅动态范围压缩这篇文章。
至于在 .NET 中实现这一点,NAudio 将允许您查看输入 WAV 文件中的示例,从而允许您创建您自己的标准化效果。我还在 Skype Voice Recorder 中演示了如何在 .NET 中实现压缩器。
您应该注意的最后一件事是,这些算法仅在您有权访问样本值时才有效。因此,例如,如果您的文件是 MP3,则需要首先转换为 PCM,然后应用标准化/压缩,最后转换回 MP3。
There are two main ways to approach this problem:
Normalization this simply involves searching for the loudest part of the audio, then amplifying the whole file so that the loudest part goes to maximum volume. This technique is only useful if the loudest part is 50% volume or less. If you have a single spike somewhere in your input file that hits max volume, then normalization does nothing for you.
Compression / Limiting this takes a slightly different approach and is used extensively in music recording. The basic idea is that any sound over a certain volume (called the 'threshold') gets made quieter (or in the case of the limiter, no sound is allowed through over a certain volume). This has the effect of evening out the volume of the whole recording (the quiet bits stay the same, and the loud bits get quieter). Then you are able to amplify the whole signal without distorting it (this is called make-up gain). See this article on dynamic range compression for more info.
As for implementing this in .NET, NAudio will let you view the samples in an input WAV file, allowing you to create your own normalization effect. I have also demonstrated in Skype Voice Recorder, how you can implement a compressor in .NET.
The final thing you should be aware of is that these algorithms only work if you have access to the sample values. So if, for example, your file is MP3, you need to first convert to PCM, then apply the normalization / compression, and finally convert back to MP3.