返回介绍

19.2. STT(Speech To Text) 语音转文本

发布于 2024-02-10 15:26:30 字数 5426 浏览 0 评论 0 收藏 0

19.2. STT(Speech To Text) 语音转文本

https://github.com/Uberi/speech_recognition

19.2.1. 安装

pip install SpeechRecognition

麦克风相关

brew install portaudio
pip install pyaudio

运行下面命令授权访问麦克风

neo@MacBook-Pro-Neo ~ % python3 -m speech_recognition			

19.2.2. 查看麦克风列表

import speech_recognition as sr

for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))			

输出结果

neo@MacBook-Pro-Neo ~/workspace/python/speech % python3 microphone.py
Microphone with name "Built-in Microphone" found for `Microphone(device_index=0)`
Microphone with name "Built-in Output" found for `Microphone(device_index=1)`			

指定麦克风设备

import speech_recognition as sr
print(sr.__version__) # just to print the version not required
r = sr.Recognizer()
mic = sr.Microphone(device_index=1) #my device index is 1, you have to put your device index			

噪声抑制

import speech_recognition as sr
print(sr.__version__) # just to print the version not required
r = sr.Recognizer()
my_mic = sr.Microphone(device_index=1) #my device index is 1, you have to put your device index
with my_mic as source:
    print("Say now!!!!")
    r.adjust_for_ambient_noise(source) #reduce noise
    audio = r.listen(source) #take voice input from the microphone
print(r.recognize_google(audio)) #to print voice into text

19.2.3. PocketSphinx 文件转文本

PocketSphinx默认仅支持英文识别,中文需要下载语言模型文件,Mandarin 为中文普通话。

brew install swig
brew install pocketsphinx
pip install PocketSphinx			

从文件识别

import speech_recognition as sr

# obtain audio from the file
recognizer = sr.Recognizer()
audioFile = sr.AudioFile(r"english.wav")
with audioFile as source:
    audio = recognizer.record(source)
# recognize speech using Sphinx
try:
    print("Sphinx thinks you said: " + recognizer.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

从麦克风识别

#!/usr/bin/env python3

import speech_recognition as sr

print(sr.__version__)

for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

# recognize speech using Sphinx
try:
    print("Sphinx thinks you said: " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))    			

19.2.4. Google Cloud Speech API

使用谷歌产品先要会使用科学上网,你懂得!

import speech_recognition as sr
 
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)
try:
    text = r.recognize_google(audio)
    print("You said: " + text)
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service" + format(e))			

指定默认语言

text = r.recognize_google(audio, language='zh-CN', show_all= True)	
text = r.recognize_google(audio_data, language=”es-ES”)	

19.2.5. IBM Speech to Text

使用IBM的服务需要一个云账号 IBM Cloud,如你你没有请先注册一个账号,然后创建 Speech To Text 服务。

测试 Speech to Text 是否正常工作

neo@MacBook-Pro-Neo ~/workspace/python/speech % wget https://watson-developer-cloud.github.io/doc-tutorial-downloads/speech-to-text/audio-file.flac	

neo@MacBook-Pro-Neo ~/workspace/python/speech % curl -X POST -u "apikey:eXuTdDOg_l7Ljp5bV8NpFsswVq58ebf2Kr-K5dpp5SZK" \
--header "Content-Type: audio/flac" \
--data-binary audio-file.flac \
"https://api.au-syd.speech-to-text.watson.cloud.ibm.com/instances/8a7df79c-c8fe-4e31-8000-c44bbd025b22/v1/recognize"
#!/usr/bin/env python3

import speech_recognition as sr
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# obtain path to "english.wav" in the same folder as this script
from os import path
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")
print(AUDIO_FILE)

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file


try:
    print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username="netkiller@msn.com", password="******"))
except sr.UnknownValueError:
    print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
    print("Could not request results from IBM Speech to Text service; {0}".format(e))			

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文