如何从音频文件中分离男声和女声(c++或java)

发布于 2024-07-14 20:31:19 字数 69 浏览 17 评论 0原文

我想区分音频文件中的男声和女声并将它们分开。作为输出,我希望将两个声音分开。你能帮我一下吗?编码可以用java还是c++完成

I want to differentiate betwen the male n female voices in an audio file and seperate them.As an output I want the two voices seperated.Can u please help me out n can the coding be done in java or c++

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

予囚 2024-07-21 20:31:19

这可能是一个非常复杂的问题,它类似于编写自己的语音识别(或识别)算法。

首先,您可以使用快速傅里叶变换将音频转换为频域。

对于您进行 FFT 的每个时间片,这将为您提供频率及其幅度的列表。 您将需要通过分析谐波来检测基音。 第二次和第三次谐波将是最清晰的。 很难弄清楚它们是哪些谐波,尤其是考虑到背景噪音以及人声之间的自然差异(就哪些谐波最大而言)。 然后,您可以尝试根据您猜测的基音来确定说话者是男性还是女性。

请记住,在许多词性中,例如齿音(“s”、“t”等),没有音调,只有噪音。 它需要非常聪明。

希望这能让您朝着正确的总体方向前进。

注意:如果两个声音同时出现并且您想将它们完全分开,那么这对您没有帮助。 我不相信活着的人能解决这样的问题。

This is potentially a very complicated question, and it is similar to writing your own speech recognition (or identification) algorithm.

You would start by converting the audio into the frequency domain, which is done using a Fast Fourier Transform.

For each slice in time that you take an FFT, this will give you a list of frequencies and their amplitudes. You will somehow need to detect the fundamental tone by analysing the harmonics. The 2nd and 3rd harmonics will be clearest. It's very hard to figure out which harmonics they are, especially with the background noise and the natural difference between people's voices in terms of which harmonics are loudest. Then you can try to determine if the speaker is male or female by whatever you guessed the fundamental tone to be.

Keep in mind that during many parts of speech like sibilance ('s', 't', etc) there is no tone, just noise. It will need to be pretty intelligent.

Hope that sets you in the right general direction.

Note: if the two voices are simultaneous and you want to separate them cleanly, then this won't help you. I don't believe anyone alive has solved such a problem.

相思故 2024-07-21 20:31:19

我认为这已经是可能的。 我刚刚开始参加斯坦福大学 Andrew Ng 教授开设的机器学习在线课程,在第一堂课中,他展示了一个演示,其中处理两个重叠声音的录音并提取单个声音(与音乐中的音乐相同)背景和说话的人)。 显然,它使用了一种无监督学习算法,可以提取两种潜在模式。 您可能想研究一下该课程(这里有该课程的一个版本:http://www. academicearth.org/courses/machine-learning

I think this is already possible. I just started taking an on-line course on Machine Learning by Stanford University with professor Andrew Ng, and during the first lecture he shows a demo where an audio recording of two overlapping voices is processed and the individual voices extracted (the same with music in the background and a person speaking). Apparently it uses an unsupervised learning algorithm that allows it to extract the two underlying patterns. You may want to look into that course (there's one version of the course here: http://www.academicearth.org/courses/machine-learning)

冷心人i 2024-07-21 20:31:19

实现这一目标的工具之一是 LIUM spkdiarization。 它是用 Java 编写并在 GPL 下可用的语音识别工具,并使用男性、女性和儿童的统计模型。 幸运的是,我们提供了模型,您可以使用它,而无需标记录音和训练模型。

请参阅LIUM wiki 的脚本页面获取示例,在页面中搜索“性别”。

One such tool that makes this possible is LIUM spkdiarization. Written in Java and available under GPL, it is a speech recognition tool and uses statistical models for male, female and child. Luckily for you, the models are provided and you can use it without having to tag the recordings and train the models.

See the scripting page of the LIUM wiki for examples, search in page for "gender".

肩上的翅膀 2024-07-21 20:31:19

我首先要说这是不可能的。 语音识别真的非常非常难。

你的问题不清楚——声音有重叠吗? 如果是这样,将它们分开将变得异常困难。

如果它们是分开的,那么您更有可能的选择是拥有大量男性和女性声音样本,并寻找共同特征(以及以编程方式识别它们的方法)。 如果样本记录不清晰(如果有背景噪音),事情就会变得更加复杂。

您可能会使用平均音调 - 男性声音通常比女性声音低沉。

I would start by saying this is impossible. Speech recognition is really, really hard.

You're not clear in your question - are the voices overlapping? If so, splitting them up will be absurdly difficult.

If they are separate, your more likely bet is to have a large set of samples of male and female voices, and look for common characteristics (and a way to programmatically identify them). If the samples aren't recorded cleanly (if they have background noise), things get even more complicated.

You may get away with an average tone - male voices are generally deeper than female..

小情绪 2024-07-21 20:31:19

你所要求的是一项艰巨的任务。 thomasrutter 写了一些如何做到这一点的“指针” - 但是,我想如果你希望在任何地方使用它(在各种音乐中(当然包括唱歌)),该算法必须非常强大。 也许从歌曲中分离(分割)单个乐器样本开始会更好/更容易。

What you are asking is one hell of a task. thomasrutter wrote some "pointers" how to do it - but, i guess the algorithm would have to be really really robust if you would wish to use it everywhere (in all sorts of music (with singing of course)). Maybe it would be better/easier to start with separating (spliting) a single instrument sample from the song.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文