Python 中的 Google 语音识别库的 voice_to_text() 时间极其缓慢
正如标题所述,我正在尝试使用语音识别库制作一个名为 Sapphire 的持续聆听人工智能。重新启动代码后大约一分钟,它工作正常,但是,在运行一分钟多后,speech_to_text() 需要永远运行。
任何帮助将不胜感激,我正在寻找某种形式的解决方案来解决这个问题。也许我对这些函数的理解不够好,或者可能有一种方法可以在一段时间后停止speech_to_text()函数。
除了使用线程的语音版本之外,我还运行短信/电子邮件版本的机器人,但在涉及线程之前,我在使用 voice_to_text() 时遇到了这个问题。
感谢您的帮助!
这是输出:
Me --> Sapphire what time is it
speech_to_text() Time = 5.611827599990647
Sapphire --> 16:46.
Listening...
Me --> ERROR
speech_to_text() Time = 3.4650153999973554
Listening...
Me --> ERROR
speech_to_text() Time = 6.241592899998068
Listening...
Me --> ERROR
speech_to_text() Time = 12.198483600004693
Listening...
Me --> ERROR
speech_to_text() Time = 3.7981161000061547
Listening...
Me --> shoe stamps
speech_to_text() Time = 51.52946890000021
Listening...
Me --> ERROR
speech_to_text() Time = 6.57019980000041
Listening...
Me --> ERROR
speech_to_text() Time = 46.647391800011974
Listening...
这是我运行 Sapphire AI 的代码:
class ChatBot():
def __init__(self, name):
print("----- Starting up", name, "-----")
self.name = name
def speech_to_text(self):
recognizer = sr.Recognizer()
# with sr.Microphone(device_index=3) as mic:
with sr.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic)
print("Listening...")
audio = recognizer.listen(mic)
self.text="ERROR"
try:
self.text = recognizer.recognize_google(audio)
print("Me --> ", self.text)
except:
print("Me --> ERROR")
@staticmethod
def text_to_speech(text):
if text == "":
print("ERROR")
else:
print((ai.name+" --> "), text)
speaker = gTTS(text=text, lang="en", slow=False)
speaker.save("res.mp3")
vlc_instance = vlc.Instance("--no-video")
player = vlc_instance.media_player_new()
media = vlc_instance.media_new("res.mp3")
player.set_media(media)
player.play()
def wake_up(self, text):
return True if (self.name).lower() in text.lower() else False
def parse_input(txt):
## action time
if "time" in txt and "is" in txt and "it" in txt:
res = action_time()
elif ai.name.lower() in txt:
res = np.random.choice(
["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
"It's-a Me!, Sapphire!"])
## respond politely
elif any(i in txt for i in ["thank", "thanks"]):
res = np.random.choice(
["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
"mention not."])
elif any(i in txt for i in ["exit", "close"]):
res = np.random.choice(
["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
ex = False
## conversation
else:
if txt == "ERROR":
# res="Sorry, come again?"
res = ""
else:
starttime1 = timeit.default_timer()
chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
endtime1 = timeit.default_timer()
print("Transformer Time = ", (endtime1 - starttime1))
res = str(chat)
res = res[res.find("bot >> ") + 6:].strip()
return res
def sapphire_audio():
ex = True
start = 0
while ex:
starttime1 = timeit.default_timer()
ai.speech_to_text()
endtime1 = timeit.default_timer()
print("speech_to_text() Time = ", (endtime1 - starttime1))
## wake up
if ai.wake_up(ai.text) is True:
#remove Sapphire from phrase
ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
if start == 0:
res = "Hello I am Sapphire the AI, what can I do for you?"
start = 1
else:
res = parse_input(ai.text)
ai.text_to_speech(res)
if __name__ == "__main__":
os.environ["TOKENIZERS_PARALLELISM"] = "true"
# sapphire_email()
threading.Thread(target=sapphire_email).start()
threading.Thread(target=sapphire_audio).start()
As the title states, I am trying to do a continual listening AI named Sapphire using the speech_recognition library. For about one minute after starting the code fresh it works fine, however, after it has been running for more than a minute, the speech_to_text() takes forever to run.
Any help would be appreciated, I am looking for some form of solution to this issue. Perhaps I am not understanding the functions well enough, or there may be a way to stop the speech_to_text() function after a certain time.
I am running a texting/email version of the bot as well in addition to the voice version using threading, but I was having this problem with speech_to_text() before threading was involved.
Thank you for your help!
Here is the output:
Me --> Sapphire what time is it
speech_to_text() Time = 5.611827599990647
Sapphire --> 16:46.
Listening...
Me --> ERROR
speech_to_text() Time = 3.4650153999973554
Listening...
Me --> ERROR
speech_to_text() Time = 6.241592899998068
Listening...
Me --> ERROR
speech_to_text() Time = 12.198483600004693
Listening...
Me --> ERROR
speech_to_text() Time = 3.7981161000061547
Listening...
Me --> shoe stamps
speech_to_text() Time = 51.52946890000021
Listening...
Me --> ERROR
speech_to_text() Time = 6.57019980000041
Listening...
Me --> ERROR
speech_to_text() Time = 46.647391800011974
Listening...
Here is my code to run the Sapphire AI:
class ChatBot():
def __init__(self, name):
print("----- Starting up", name, "-----")
self.name = name
def speech_to_text(self):
recognizer = sr.Recognizer()
# with sr.Microphone(device_index=3) as mic:
with sr.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic)
print("Listening...")
audio = recognizer.listen(mic)
self.text="ERROR"
try:
self.text = recognizer.recognize_google(audio)
print("Me --> ", self.text)
except:
print("Me --> ERROR")
@staticmethod
def text_to_speech(text):
if text == "":
print("ERROR")
else:
print((ai.name+" --> "), text)
speaker = gTTS(text=text, lang="en", slow=False)
speaker.save("res.mp3")
vlc_instance = vlc.Instance("--no-video")
player = vlc_instance.media_player_new()
media = vlc_instance.media_new("res.mp3")
player.set_media(media)
player.play()
def wake_up(self, text):
return True if (self.name).lower() in text.lower() else False
def parse_input(txt):
## action time
if "time" in txt and "is" in txt and "it" in txt:
res = action_time()
elif ai.name.lower() in txt:
res = np.random.choice(
["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
"It's-a Me!, Sapphire!"])
## respond politely
elif any(i in txt for i in ["thank", "thanks"]):
res = np.random.choice(
["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
"mention not."])
elif any(i in txt for i in ["exit", "close"]):
res = np.random.choice(
["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
ex = False
## conversation
else:
if txt == "ERROR":
# res="Sorry, come again?"
res = ""
else:
starttime1 = timeit.default_timer()
chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
endtime1 = timeit.default_timer()
print("Transformer Time = ", (endtime1 - starttime1))
res = str(chat)
res = res[res.find("bot >> ") + 6:].strip()
return res
def sapphire_audio():
ex = True
start = 0
while ex:
starttime1 = timeit.default_timer()
ai.speech_to_text()
endtime1 = timeit.default_timer()
print("speech_to_text() Time = ", (endtime1 - starttime1))
## wake up
if ai.wake_up(ai.text) is True:
#remove Sapphire from phrase
ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
if start == 0:
res = "Hello I am Sapphire the AI, what can I do for you?"
start = 1
else:
res = parse_input(ai.text)
ai.text_to_speech(res)
if __name__ == "__main__":
os.environ["TOKENIZERS_PARALLELISM"] = "true"
# sapphire_email()
threading.Thread(target=sapphire_email).start()
threading.Thread(target=sapphire_audio).start()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先 - 尝试测量什么方法需要那么长时间来执行。是
listen()
方法还是recognize_google()
?尝试在开始时仅使用一次
recognizer.adjust_for_ambient_noise(mic)
函数,而不是每次使用speech_to_text()
函数时都使用它,看看之后会发生什么。函数
recognizer.listen(mic)
等待麦克风中的音频降至recognizer.adjust_for_ambient_noise(mic)
设置的某个阈值。我认为有时阈值设置得很低,要达到该环境噪声水平,您需要等待很长时间。 (大胆地检查你的麦克风?听听并分析环境噪音是否不时变化?)
此外,你还使用公共 API 密钥将该音频发送到 Google 服务器。这只是一个猜测,但使用不太好的家庭互联网上传速度发送长长度音频数据可能会带来一些额外的延迟。也许 Google 由于您在公共 API 密钥上发送了许多请求,因此不会优先考虑您的请求,这可能会导致另一次延迟。
但这只是一个猜测。试着按照我一开始写的去做,我们就会找到答案的。
First of all - try to measure what method takes that long time to execute. Is it the
listen()
method orrecognize_google()
?Try using function
recognizer.adjust_for_ambient_noise(mic)
just once in the beginning, and not every time when you usespeech_to_text()
function and see what will happen after that.Function
recognizer.listen(mic)
waits for audio from your microphone to come down to some threshold set byrecognizer.adjust_for_ambient_noise(mic)
.I assume that sometimes threshold is set so low and to achieve that level of ambient noise you need to wait very long time. (Check your mic in audacity? Listen to that and analyze if ambient noise changes from time to time?)
Also you are sending that audio to Google server using public API key. It's only a guess but maybe some additional delay is provided with sending long length audio data using not that great home internet upload speed. And perhaps Google, since you are sending many requests on public API key, isn't prioritizing your requests which can lead to another delay.
But it's just a guess. Try to do what I wrote at the beginning and we will figure it out.