Python 中的 Google 语音识别库的 voice_to_text() 时间极其缓慢

发布于 2025-01-16 20:45:48 字数 4362 浏览 8 评论 0原文

正如标题所述,我正在尝试使用语音识别库制作一个名为 Sapphire 的持续聆听人工智能。重新启动代码后大约一分钟,它工作正常,但是,在运行一分钟多后,speech_to_text() 需要永远运行。

任何帮助将不胜感激,我正在寻找某种形式的解决方案来解决这个问题。也许我对这些函数的理解不够好,或者可能有一种方法可以在一段时间后停止speech_to_text()函数。

除了使用线程的语音版本之外,我还运行短信/电子邮件版本的机器人,但在涉及线程之前,我在使用 voice_to_text() 时遇到了这个问题。

感谢您的帮助!

这是输出:

Me  -->  Sapphire what time is it
speech_to_text() Time =  5.611827599990647
Sapphire -->  16:46.
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.4650153999973554
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.241592899998068
Listening...
Me  -->  ERROR
speech_to_text() Time =  12.198483600004693
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.7981161000061547
Listening...
Me  -->  shoe stamps
speech_to_text() Time =  51.52946890000021
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.57019980000041
Listening...
Me  -->  ERROR
speech_to_text() Time =  46.647391800011974
Listening...

这是我运行 Sapphire AI 的代码:

class ChatBot():
    def __init__(self, name):
        print("----- Starting up", name, "-----")
        self.name = name

    def speech_to_text(self):
        recognizer = sr.Recognizer()
        # with sr.Microphone(device_index=3) as mic:
        with sr.Microphone() as mic:
            recognizer.adjust_for_ambient_noise(mic)
            print("Listening...")
            audio = recognizer.listen(mic)
            self.text="ERROR"
        try:
            self.text = recognizer.recognize_google(audio)
            print("Me  --> ", self.text)
        except:
            print("Me  -->  ERROR")

    @staticmethod
    def text_to_speech(text):
        if text == "":
            print("ERROR")
        else:
            print((ai.name+" --> "), text)
            speaker = gTTS(text=text, lang="en", slow=False)
            speaker.save("res.mp3")

            vlc_instance = vlc.Instance("--no-video")
            player = vlc_instance.media_player_new()

            media = vlc_instance.media_new("res.mp3")

            player.set_media(media)
            player.play()


    def wake_up(self, text):
        return True if (self.name).lower() in text.lower() else False


def parse_input(txt):
    ## action time
    if "time" in txt and "is" in txt and "it" in txt:
        res = action_time()
    elif ai.name.lower() in txt:
        res = np.random.choice(
            ["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
             "It's-a Me!, Sapphire!"])
    ## respond politely
    elif any(i in txt for i in ["thank", "thanks"]):
        res = np.random.choice(
            ["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
             "mention not."])
    elif any(i in txt for i in ["exit", "close"]):
        res = np.random.choice(
            ["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
        ex = False
    ## conversation
    else:
        if txt == "ERROR":
            # res="Sorry, come again?"
            res = ""
        else:
            starttime1 = timeit.default_timer()
            chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
            endtime1 = timeit.default_timer()
            print("Transformer Time = ", (endtime1 - starttime1))
            res = str(chat)
            res = res[res.find("bot >> ") + 6:].strip()
    return res

def sapphire_audio():
    ex = True
    start = 0
    while ex:
        starttime1 = timeit.default_timer()
        ai.speech_to_text()
        endtime1 = timeit.default_timer()
        print("speech_to_text() Time = ", (endtime1 - starttime1))
        ## wake up
        if ai.wake_up(ai.text) is True:
            #remove Sapphire from phrase
            ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
            if start == 0:
                res = "Hello I am Sapphire the AI, what can I do for you?"
                start = 1
            else:
                res = parse_input(ai.text)
            ai.text_to_speech(res)

if __name__ == "__main__":

    os.environ["TOKENIZERS_PARALLELISM"] = "true"

    # sapphire_email()
    threading.Thread(target=sapphire_email).start()
    threading.Thread(target=sapphire_audio).start()

As the title states, I am trying to do a continual listening AI named Sapphire using the speech_recognition library. For about one minute after starting the code fresh it works fine, however, after it has been running for more than a minute, the speech_to_text() takes forever to run.

Any help would be appreciated, I am looking for some form of solution to this issue. Perhaps I am not understanding the functions well enough, or there may be a way to stop the speech_to_text() function after a certain time.

I am running a texting/email version of the bot as well in addition to the voice version using threading, but I was having this problem with speech_to_text() before threading was involved.

Thank you for your help!

Here is the output:

Me  -->  Sapphire what time is it
speech_to_text() Time =  5.611827599990647
Sapphire -->  16:46.
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.4650153999973554
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.241592899998068
Listening...
Me  -->  ERROR
speech_to_text() Time =  12.198483600004693
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.7981161000061547
Listening...
Me  -->  shoe stamps
speech_to_text() Time =  51.52946890000021
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.57019980000041
Listening...
Me  -->  ERROR
speech_to_text() Time =  46.647391800011974
Listening...

Here is my code to run the Sapphire AI:

class ChatBot():
    def __init__(self, name):
        print("----- Starting up", name, "-----")
        self.name = name

    def speech_to_text(self):
        recognizer = sr.Recognizer()
        # with sr.Microphone(device_index=3) as mic:
        with sr.Microphone() as mic:
            recognizer.adjust_for_ambient_noise(mic)
            print("Listening...")
            audio = recognizer.listen(mic)
            self.text="ERROR"
        try:
            self.text = recognizer.recognize_google(audio)
            print("Me  --> ", self.text)
        except:
            print("Me  -->  ERROR")

    @staticmethod
    def text_to_speech(text):
        if text == "":
            print("ERROR")
        else:
            print((ai.name+" --> "), text)
            speaker = gTTS(text=text, lang="en", slow=False)
            speaker.save("res.mp3")

            vlc_instance = vlc.Instance("--no-video")
            player = vlc_instance.media_player_new()

            media = vlc_instance.media_new("res.mp3")

            player.set_media(media)
            player.play()


    def wake_up(self, text):
        return True if (self.name).lower() in text.lower() else False


def parse_input(txt):
    ## action time
    if "time" in txt and "is" in txt and "it" in txt:
        res = action_time()
    elif ai.name.lower() in txt:
        res = np.random.choice(
            ["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
             "It's-a Me!, Sapphire!"])
    ## respond politely
    elif any(i in txt for i in ["thank", "thanks"]):
        res = np.random.choice(
            ["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
             "mention not."])
    elif any(i in txt for i in ["exit", "close"]):
        res = np.random.choice(
            ["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
        ex = False
    ## conversation
    else:
        if txt == "ERROR":
            # res="Sorry, come again?"
            res = ""
        else:
            starttime1 = timeit.default_timer()
            chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
            endtime1 = timeit.default_timer()
            print("Transformer Time = ", (endtime1 - starttime1))
            res = str(chat)
            res = res[res.find("bot >> ") + 6:].strip()
    return res

def sapphire_audio():
    ex = True
    start = 0
    while ex:
        starttime1 = timeit.default_timer()
        ai.speech_to_text()
        endtime1 = timeit.default_timer()
        print("speech_to_text() Time = ", (endtime1 - starttime1))
        ## wake up
        if ai.wake_up(ai.text) is True:
            #remove Sapphire from phrase
            ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
            if start == 0:
                res = "Hello I am Sapphire the AI, what can I do for you?"
                start = 1
            else:
                res = parse_input(ai.text)
            ai.text_to_speech(res)

if __name__ == "__main__":

    os.environ["TOKENIZERS_PARALLELISM"] = "true"

    # sapphire_email()
    threading.Thread(target=sapphire_email).start()
    threading.Thread(target=sapphire_audio).start()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

灯下孤影 2025-01-23 20:45:48

首先 - 尝试测量什么方法需要那么长时间来执行。是 listen() 方法还是 recognize_google()

尝试在开始时仅使用一次 recognizer.adjust_for_ambient_noise(mic) 函数,而不是每次使用 speech_to_text() 函数时都使用它,看看之后会发生什么。

函数 recognizer.listen(mic) 等待麦克风中的音频降至 recognizer.adjust_for_ambient_noise(mic) 设置的某个阈值。

我认为有时阈值设置得很低,要达到该环境噪声水平,您需要等待很长时间。 (大胆地检查你的麦克风?听听并分析环境噪音是否不时变化?)

此外,你还使用公共 API 密钥将该音频发送到 Google 服务器。这只是一个猜测,但使用不太好的家庭互联网上传速度发送长长度音频数据可能会带来一些额外的延迟。也许 Google 由于您在公共 API 密钥上发送了许多请求,因此不会优先考虑您的请求,这可能会导致另一次延迟。

但这只是一个猜测。试着按照我一开始写的去做,我们就会找到答案的。

First of all - try to measure what method takes that long time to execute. Is it the listen() method or recognize_google()?

Try using function recognizer.adjust_for_ambient_noise(mic) just once in the beginning, and not every time when you use speech_to_text() function and see what will happen after that.

Function recognizer.listen(mic) waits for audio from your microphone to come down to some threshold set by recognizer.adjust_for_ambient_noise(mic).

I assume that sometimes threshold is set so low and to achieve that level of ambient noise you need to wait very long time. (Check your mic in audacity? Listen to that and analyze if ambient noise changes from time to time?)

Also you are sending that audio to Google server using public API key. It's only a guess but maybe some additional delay is provided with sending long length audio data using not that great home internet upload speed. And perhaps Google, since you are sending many requests on public API key, isn't prioritizing your requests which can lead to another delay.

But it's just a guess. Try to do what I wrote at the beginning and we will figure it out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文