如何提取和存储自动语音识别深度学习应用程序生成的文本

发布于 2025-01-16 10:58:38 字数 730 浏览 0 评论 0原文

该应用程序可以在 Huggingface https://huggingface.co/spaces/rowel/asr 中查看

import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")
gr.Interface.from_pipeline(model,
                           title="Automatic Speech Recognition (ASR)",
                           description="Using pipeline with Facebook S2T for ASR.",
                           examples=['data/ljspeech.wav',]
                           ).launch()

我不知道那几行代码的文本文件存储在哪里。我想将句子文本存储在字符串中。

老实说,我只知道基本的Python编程。我只想将它们存储到字符串变量中并用它们做一些事情。

The app can be viewed in huggingface https://huggingface.co/spaces/rowel/asr

import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")
gr.Interface.from_pipeline(model,
                           title="Automatic Speech Recognition (ASR)",
                           description="Using pipeline with Facebook S2T for ASR.",
                           examples=['data/ljspeech.wav',]
                           ).launch()

I don't know where the text files are stored with that very few lines of code. I would like to store the sentence text in a string.

Honestly I only know basic python programming. I would just like to store them into string variables and do something with them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吃颗糖壮壮胆 2025-01-23 10:58:38

您可以打开 Interface.from_pipeline 抽象,并定义您自己的 Gradio 接口。您需要定义自己的输入、输出和预测函数,从而从模型访问文本预测。这是一个例子。

您可以在此处进行测试 https://huggingface.co/spaces/radames/Speech-Recognition -示例


import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")


def predict_speech_to_text(audio):
    prediction = model(audio)
    # text variable contains your voice-to-text string
    text = prediction['text']
    return text


gr.Interface(fn=predict_speech_to_text,
             title="Automatic Speech Recognition (ASR)",
             inputs=gr.inputs.Audio(
                 source="microphone", type="filepath", label="Input"),
             outputs=gr.outputs.Textbox(label="Output"),
             description="Using pipeline with F acebook S2T for ASR.",
             examples=['ljspeech.wav'],
             allow_flagging='never'
             ).launch()

You can open up the Interface.from_pipeline abstraction, and define your own Gradio interface. You need to define your own inputs, outputs, and prediction function, thus accessing the text prediction from the model. Here is an example.

You can test is here https://huggingface.co/spaces/radames/Speech-Recognition-Example


import gradio as gr
from transformers import pipeline


model = pipeline(task="automatic-speech-recognition",
                 model="facebook/s2t-medium-librispeech-asr")


def predict_speech_to_text(audio):
    prediction = model(audio)
    # text variable contains your voice-to-text string
    text = prediction['text']
    return text


gr.Interface(fn=predict_speech_to_text,
             title="Automatic Speech Recognition (ASR)",
             inputs=gr.inputs.Audio(
                 source="microphone", type="filepath", label="Input"),
             outputs=gr.outputs.Textbox(label="Output"),
             description="Using pipeline with F acebook S2T for ASR.",
             examples=['ljspeech.wav'],
             allow_flagging='never'
             ).launch()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文