谷歌语音识别无法识别某些单词/短语,例如“嗯”和“呃”| Python
因此,Google的语音识别似乎正在消除我的演讲的某些部分,例如UM,ER和AHH。问题是我希望这些被认可,我似乎无法弄清楚如何启用这一点。
这是代码:
import pyttsx3
recognizer = speech_recognition.Recognizer()
vocal_imperfections = 0
vi_list = ['hmm', 'umm', 'aha', 'ahh', 'uh', 'um', 'er']
while True:
try:
with speech_recognition.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.2)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio, language='en-IN', show_all=True)
#text = recognizer.recognize_ibm(audio)
if text != []:
text = text['alternative'][0]['transcript']
if any(word in text for word in vi_list):
vocal_imperfections = vocal_imperfections+1
print(text)
print(vocal_imperfections)
except speech_recognition.UnknownValueError():
recognizer = speech_recognition.Recognizer()
continue
它可以按求所需的工作,只是Google拿出了人声瑕疵。有谁知道如何启用这一点,或者替代自由的实时语音识别,以识别声音瑕疵?
例子: 如果我说:“嗯,我认为今天是第30位” Google会返回:“我认为今天是第30位”
So it seems google speech recognition is taking out certain parts of my speech like um, er and ahh. The problem is I want these to be recognized, I can not seem to figure out how to enable this.
Here is the code:
import pyttsx3
recognizer = speech_recognition.Recognizer()
vocal_imperfections = 0
vi_list = ['hmm', 'umm', 'aha', 'ahh', 'uh', 'um', 'er']
while True:
try:
with speech_recognition.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.2)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio, language='en-IN', show_all=True)
#text = recognizer.recognize_ibm(audio)
if text != []:
text = text['alternative'][0]['transcript']
if any(word in text for word in vi_list):
vocal_imperfections = vocal_imperfections+1
print(text)
print(vocal_imperfections)
except speech_recognition.UnknownValueError():
recognizer = speech_recognition.Recognizer()
continue
It works as wanted just google takes out the vocal imperfections. Does anyone know how to enable this, or alternative free real time speech recognition that will recognize vocal imperfections?
Example:
If I were to say: "um, I think today is the 30th"
Google would return: "I think today is the 30th"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我查看了 Google Cloud Speech-to-text API 文档 并且没有看到任何相关内容(截至 2022 年 3 月)。我还遇到了这些相关资源:
所有证据都表明它不是'无法使用 Google Cloud 语音转文本服务(目前),并且您必须寻求替代服务。我不会重复资源中列出的替代方案,但提供了几种选择,您必须选择最适合您的特定需求的一种。
另外,您可能已经知道这一点(如果您知道的话,我们深表歉意),但这些类型的单词通常称为“填充”和/或“犹豫”单词。这可能对您研究该主题有帮助。
好消息是 SpeechRecognition 模块(我认为这就是您根据代码使用的模块) )支持多种不同的引擎,因此希望其中之一提供填充词。
I took a look at the Google Cloud Speech-to-text API docs and didn't see anything relevant (as of March 2022). I also came across these related resources:
All evidence suggests that it isn't possible to use the Google Cloud Speech-to-text service (at this time), and that you'll have to seek alternative services. I won't rehash the alternatives listed in the resources, but several are provided and you'll have to pick which one best suits your particular needs.
Also, you may already know this (so apologies if you do), but these types of words are typically called "filler" and/or "hesitation" words. That might be helpful to you while researching the topic.
The good news is that the SpeechRecognition module (I think that's what you're using based on your code) supports several different engines, so hopefully one of those provides filler words.