我有一个 .mp3 文件。如何将 C 语言中的人声与其他声音分开?
在 C 语言中是否有可能 [我知道一般来说这是可能的 -GOM玩家是吗]?让我开始吧...你说什么?
如何准确识别人声与其他声音的区别?
Is it even possible in C [I know it is possible in general -GOM player does it]? just let me get started... What do you say?
How exactly do you identify human voice distinguished from other sounds?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
MP3 播放器中的滤波器通常依赖于立体声录音室中的语音源(表演者)位于中心的事实。所以他们只是计算通道之间的差异。如果你给他们一个录音,其中表演者没有像他们那样定位,那么他们就会失败 - 声音不会被提取。
可靠的方法是使用语音检测器。这是一个非常复杂的问题,涉及核心数学和针对特定任务彻底调整算法。如果你这样做,你就从阅读语音编码(声码器)开始。
Filters in mp3 players usually rely on the fact that the voice source (the performer) in a stereo recording studio is positioned at the center. So they just compute the difference between the channels. If you give them a recording where the performer is not positioned like that they fail - the voice is not extracted.
The reliable way is employing a voice detector. This is a very complex problem that involves hardcore math and thorough tuning of the algorithms for your specific task. if you go this way you start with reading on voice coding (vocoders).
此处讨论了这个确切的主题。它最初是对音频编码技术的讨论,但在上面的链接页面上有人说
但有人指出,提取声音应该不会比消除声音困难。
我会让您进一步阅读,但我怀疑成功的提取可能依赖于与乐器相比相对较窄的声音频谱分布。
This exact topic was discussed here. It started out as a discussion of audio coding technologies, but on the linked page above someone said
But it was pointed out that extracting the voice should be no more difficult than eliminating the voice.
I'll let you read further, but I suspect successful extraction may rely on the relatively narrow spectral distribution of the voice compared to instruments.
请注意,原则上不可能完美分离在一个音轨中混合在一起的不同声音。这就像当你将奶油混合到咖啡中一样 - 混合后,不可能将奶油和咖啡完美分离。
可能有智能信号处理技巧来获得可接受的结果,但一般来说,不可能将声音与音乐完美地分开。
Note that it is not possible in principle to perfectly separate different sounds which are mixed together in one track. It's like when you mix cream into your coffee - after it has been mixed in, it isn't possible to perfectly separate the cream and the coffee afterwards.
There might be smart signal processing tricks to get an acceptable result, but in general it's impossible to perfectly separate out the voice from the music.
将人声与其他声音区分开来并非易事。如果您有其他声音的录音,那么您可以参考取消背景声音,这将为您留下人声。
如果背景噪声是某种随机噪声,您将通过使用某种形式的光谱过滤获得胜利。但这并不简单,需要相当多的尝试才能获得良好的结果。 Adobe Audition 有一个自适应频谱滤波器,我相信...
假设您的白噪声在整个录制频段上具有相当均匀的频率分布(在 44Khz 未压缩录制中,您谈论的是 0 到 22Khz)。然后添加一个声音就可以了。显然,声音使用与噪声相同的频率。人声的范围从 ~300Hz 到 ~3400Hz。显然,对音频进行带通会将您的语音范围缩小到 300 到 3400Hz。现在怎么办?你有一个声音,并且你有现在带通的白噪声。您需要以某种方式消除噪音并保持声音完好无损。有多种过滤方案,但都会在此过程中损害语音。
祝你好运,这真的不简单!
Seperating the human voice from other sounds is no mean feat. If you have a recording of the other sounds then you can reference cancel the background sound which will leave you with the human voice.
If the background noise is random noise of some sort you will get a win by using some form of spectral filtering. But its not simple and would need a fair bit of playing with to get good results. Adobe Audition has an adaptive spectral filter i believe ...
Assume you have white noise with a fairly even frequency distribution across the entire recorded band (on a 44Khz uncompressed recording you are talking about 0 to 22Khz). Then add a voice on it. Obviously the voice is using the same frequencies as the noise. The human voice ranges from ~300Hz to ~3400Hz. Obviously bandpassing the audio will cut you down to only the voice range of 300 to 3400Hz. Now what? You have a voice AND you have the, now bandpassed, white noise. Somehow you need to be able to remove that noise and leave the voice in tact. There are various filtering schemes but all will damage the voice in the process.
Good luck, its really not gonna be simple!
查找独立成分分析 (ICA)
Look up Independent Component Analysis (ICA)
其中 buf 具有 pcm wav 44100 采样率输入数据
Where buf has the pcm wav 44100 sample rate input data