C#:使用 System.Speech 命名空间将 WAV 文件转录为文本(语音到文本)
如何使用 .NET 语音命名空间类将 WAV 文件中的音频转换为文本形式我可以在屏幕上显示或保存到文件吗?
我正在寻找一些教程示例。
更新
在此处找到了代码示例。但当我尝试时,它给出了错误的结果。下面是我采用的 vb 代码示例。 (实际上我不介意 lang,只要它是 vb/c#...)。它没有给我正确的结果。我认为如果我们使用正确的语法 - 即我们在录音中期望的单词 - 我们应该得到它的文本输出。首先,我尝试使用通话中的示例单词。有时它只打印那个(一个)单词,而不打印其他任何东西。然后我尝试了我们在录音中完全没有想到的单词...不幸的是它也打印出来了...:(
Imports System
Imports System.Speech.Recognition
Public Class Form1
Dim WithEvents sre As SpeechRecognitionEngine
Private Sub btnLiterate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLiterate.Click
If TextBox1.Text.Trim.Length = 0 Then Exit Sub
sre.SetInputToWaveFile(TextBox1.Text)
Dim r As RecognitionResult
r = sre.Recognize()
If r Is Nothing Then
TextBox2.Text = "Could not fetch result"
Return
End If
TextBox2.Text = r.Text
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
TextBox1.Text = String.Empty
Dim dr As DialogResult
dr = OpenFileDialog1.ShowDialog()
If dr = Windows.Forms.DialogResult.OK Then
If Not OpenFileDialog1.FileName.Contains("wav") Then
MessageBox.Show("Incorrect file")
Else
TextBox1.Text = OpenFileDialog1.FileName
End If
End If
End Sub
Public Sub New()
' This call is required by the Windows Form Designer.
InitializeComponent()
sre = New SpeechRecognitionEngine()
End Sub
Private Sub sre_LoadGrammarCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.LoadGrammarCompletedEventArgs) Handles sre.LoadGrammarCompleted
End Sub
Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechHypothesizedEventArgs) Handles sre.SpeechHypothesized
System.Diagnostics.Debug.Print(e.Result.Text)
End Sub
Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognitionRejectedEventArgs) Handles sre.SpeechRecognitionRejected
System.Diagnostics.Debug.Print("Rejected: " & e.Result.Text)
End Sub
Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles sre.SpeechRecognized
System.Diagnostics.Debug.Print(e.Result.Text)
End Sub
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim words As String() = New String() {"triskaidekaphobia"}
Dim c As New Choices(words)
Dim grmb As New GrammarBuilder(c)
Dim grm As Grammar = New Grammar(grmb)
sre.LoadGrammar(grm)
End Sub
End Class
更新(11 月 28 日之后)
找到了一种加载默认语法的方法。它是这样的:
sre.LoadGrammar(New DictationGrammar)
有这里仍然存在问题。对于 6 分钟的文件,输出可能是与语音文件完全无关的 5-6 个单词。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
System.Speech 中的类用于文本到语音(主要是辅助功能)。
您正在寻找语音识别。有 System.Speech.Recognition 命名空间可用.Net 3.0。它使用 Windows 桌面语音引擎。这可能会让您入门,但我想还有更好的引擎。
语音识别非常复杂且很难正确完成,也有一些商业产品可用。
The classes in System.Speech are for text to speech (primarily an acessibility feature).
You are looking for voice recognition. There is the System.Speech.Recognition namespace available since .Net 3.0. It uses the Windows Desktop Speech engine. This might get you started, but I guess there are better engines out there.
Voice recognition is very complicated and hard to do right, there are also some commercial products available.
我意识到这是一个老问题,但在后面的问题和答案中可以找到更好的信息。例如,请参阅 在 ASP.NET Web 应用程序中将语音转录为文本的最佳选项是什么?
您可以调用 SetInputToWaveFile() 来读取音频文件,而不是调用 SetInputToDefaultAudioDevice()。
Windows Vista 和 Windows 7 中的桌面识别引擎包含听写语法,如引用的答案中所示。
I realize this is an old question, but there is better information available in later questions and answers. For example see What is the best option for transcribing speech-to-text in a asp.net web app?
Instead of calling SetInputToDefaultAudioDevice() you can call SetInputToWaveFile() to read from an audio file.
The desktop recognition engine that comes in Windows Vista and Windows 7 includes a dictation grammar as shown in the referenced answer.
您实际上需要自然语言工具包。在python中我使用了 NTLK http://www.nltk.org/
在.Net中我刚刚找到了Antelope
https://stackoverflow.com/questions/1762040/natural-language-toolkit-equivalent-in-c
另请参阅该文章http://en.wikipedia.org/wiki/Speech_recognition
You actually need Natural Language toolkit. In python I have used NTLK http://www.nltk.org/
In .Net I have just found Antelope
https://stackoverflow.com/questions/1762040/natural-language-toolkit-equivalent-in-c
see the article as well http://en.wikipedia.org/wiki/Speech_recognition
您应该使用
SpeechRecognitionEngine
。要使用波形文件,请调用SetInputToWaveFile
。我希望我能帮助你更多,但我不是专家。哦,如果你的词真的是
triskaidekaphobia
,我认为即使是人类语音识别引擎也无法识别......You should use the
SpeechRecognitionEngine
. To use a wave file, callSetInputToWaveFile
. I wish I could help you more, but I'm no expert.Oh, and if your word is really
triskaidekaphobia
, I don't think even a human speech recognition engine would recognize that...