唱歌时连续语音识别?

发布于 2024-12-01 06:03:26 字数 330 浏览 1 评论 0原文

作为我的应用程序的一部分,我希望添加语音识别,但不是传统意义上的。我有一堆由某人演唱的歌词(分为诗句),其想法是找到当前正在演唱的诗句,以便可以将其显示在屏幕上。

我玩过 sphinx 并设置了一些基本示例并工作,但是虽然似乎有大量关于注册语音文本的文档,您可以在其中等待延迟然后处理结果,但我找不到太多关于连续识别句子的想法。当然,这是在我到达唱出歌词而不是说出歌词的部分之前!

有没有人有这方面的经验,如果有的话,有什么地方可以提供一个好的起点吗?或者我想通过狮身人面像实现的目标太过雄心勃勃,而且它永远不会真正正常工作吗?我愿意考虑其他库,但它们必须是免费的,而 sphinx 是我能找到的最受广泛关注的库。

As part of my application I'm looking to add speech recognition, but not really in the traditional sense. I have a bunch of lyrics (divided into verses) that are sung by someone, and the idea is to find what verse is currently being sung so it can be displayed on screen.

I've played around with sphinx and got some basic examples set up and working, but while there seems to be plenty of documentation around on registering spoken text where you can wait for a delay then process the result, I can't find much on the idea of recognising sentences continuously. This is of course before I get to the part where the words are being sung and not spoken!

Has anyone got any experience with this, and if so is there anywhere that would provide a good starting point? Or is what I'm trying to achieve way too ambitious with sphinx and is it never really going to work properly? I'm open to looking at other libraries but they must be free, and sphinx was the most widely talked about one I could dig up.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

々眼睛长脚气 2024-12-08 06:03:26

只要稍有延迟,就完全有可能识别语音。此外,如果你或多或少了解你期望得到什么。这称为“部分结果”,可通过 API 在所有 CMUSphinx 解码器中使用。基本上你可以在过程中检索假设。

关于如何稳定这个结果(如何提取其中的稳定部分)有一个小问题需要考虑,但这种技术称为回溯并且可以很容易地实现

对于唱歌,考虑到可以过滤掉音乐,它也是可行的。

It's perfectly possible to recognize speech as soon as it's pronounced with a little delay. Moreover if you more or less understand what do you expect to get. This is called "partial result" and is available in all CMUSphinx decoders through API. Basically you can retrieve hypothesis in process.

There is a little issue to consider on how to stabilize this result (how to extract the stable part of it) but this technique is called backtracking and could be easily implemented

For singing, given the music can be filtered out it's also doable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文