C#:使用 System.Speech 命名空间将 WAV 文件转录为文本(语音到文本)

发布于 2024-08-11 20:52:01 字数 3198 浏览 3 评论 0 原文

如何使用 .NET 语音命名空间类将 WAV 文件中的音频转换为文本形式我可以在屏幕上显示或保存到文件吗?

我正在寻找一些教程示例。

更新

此处找到了代码示例。但当我尝试时,它给出了错误的结果。下面是我采用的 vb 代码示例。 (实际上我不介意 lang,只要它是 vb/c#...)。它没有给我正确的结果。我认为如果我们使用正确的语法 - 即我们在录音中期望的单词 - 我们应该得到它的文本输出。首先,我尝试使用通话中的示例单词。有时它只打印那个(一个)单词,而不打印其他任何东西。然后我尝试了我们在录音中完全没有想到的单词...不幸的是它也打印出来了...:(

Imports System
Imports System.Speech.Recognition

Public Class Form1

    Dim WithEvents sre As SpeechRecognitionEngine

    Private Sub btnLiterate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLiterate.Click
        If TextBox1.Text.Trim.Length = 0 Then Exit Sub
        sre.SetInputToWaveFile(TextBox1.Text)
        Dim r As RecognitionResult
        r = sre.Recognize()
        If r Is Nothing Then
            TextBox2.Text = "Could not fetch result"
            Return
        End If
        TextBox2.Text = r.Text
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        TextBox1.Text = String.Empty
        Dim dr As DialogResult
        dr = OpenFileDialog1.ShowDialog()
        If dr = Windows.Forms.DialogResult.OK Then
            If Not OpenFileDialog1.FileName.Contains("wav") Then
                MessageBox.Show("Incorrect file")
            Else
                TextBox1.Text = OpenFileDialog1.FileName
            End If
        End If
    End Sub

    Public Sub New()

        ' This call is required by the Windows Form Designer.
        InitializeComponent()

        sre = New SpeechRecognitionEngine()

    End Sub

    Private Sub sre_LoadGrammarCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.LoadGrammarCompletedEventArgs) Handles sre.LoadGrammarCompleted

    End Sub

    Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechHypothesizedEventArgs) Handles sre.SpeechHypothesized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognitionRejectedEventArgs) Handles sre.SpeechRecognitionRejected
        System.Diagnostics.Debug.Print("Rejected: " & e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles sre.SpeechRecognized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim words As String() = New String() {"triskaidekaphobia"}
        Dim c As New Choices(words)
        Dim grmb As New GrammarBuilder(c)
        Dim grm As Grammar = New Grammar(grmb)
        sre.LoadGrammar(grm)
    End Sub

End Class

更新(11 月 28 日之后)

找到了一种加载默认语法的方法。它是这样的:

sre.LoadGrammar(New DictationGrammar)

有这里仍然存在问题。对于 6 分钟的文件,输出可能是与语音文件完全无关的 5-6 个单词。

How do you use the .NET speech namespace classes to convert audio in a WAV file to textual form which I can display on the screen or save to file?

I am looking for some tutorial samples.

UPDATE

Found a code sample here. But when I tried it it gives incorrect results. Below is the vb code sample I've adopted. (Actually I don't mind the lang as long as its either vb/c#...). It is not giving me proper results. I assume if we put the right grammar - i.e. the words we expect in the recording - we should get the textual output of that. First I've tried with sample words that are in the call. It sometimes printed only that (one) word and nothing else. Then I tried words which we totally do not expect in the recording...Unfortunately it printed out that too... :(

Imports System
Imports System.Speech.Recognition

Public Class Form1

    Dim WithEvents sre As SpeechRecognitionEngine

    Private Sub btnLiterate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLiterate.Click
        If TextBox1.Text.Trim.Length = 0 Then Exit Sub
        sre.SetInputToWaveFile(TextBox1.Text)
        Dim r As RecognitionResult
        r = sre.Recognize()
        If r Is Nothing Then
            TextBox2.Text = "Could not fetch result"
            Return
        End If
        TextBox2.Text = r.Text
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        TextBox1.Text = String.Empty
        Dim dr As DialogResult
        dr = OpenFileDialog1.ShowDialog()
        If dr = Windows.Forms.DialogResult.OK Then
            If Not OpenFileDialog1.FileName.Contains("wav") Then
                MessageBox.Show("Incorrect file")
            Else
                TextBox1.Text = OpenFileDialog1.FileName
            End If
        End If
    End Sub

    Public Sub New()

        ' This call is required by the Windows Form Designer.
        InitializeComponent()

        sre = New SpeechRecognitionEngine()

    End Sub

    Private Sub sre_LoadGrammarCompleted(ByVal sender As Object, ByVal e As System.Speech.Recognition.LoadGrammarCompletedEventArgs) Handles sre.LoadGrammarCompleted

    End Sub

    Private Sub sre_SpeechHypothesized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechHypothesizedEventArgs) Handles sre.SpeechHypothesized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognitionRejected(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognitionRejectedEventArgs) Handles sre.SpeechRecognitionRejected
        System.Diagnostics.Debug.Print("Rejected: " & e.Result.Text)
    End Sub

    Private Sub sre_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.SpeechRecognizedEventArgs) Handles sre.SpeechRecognized
        System.Diagnostics.Debug.Print(e.Result.Text)
    End Sub

    Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim words As String() = New String() {"triskaidekaphobia"}
        Dim c As New Choices(words)
        Dim grmb As New GrammarBuilder(c)
        Dim grm As Grammar = New Grammar(grmb)
        sre.LoadGrammar(grm)
    End Sub

End Class

UPDATE(after Nov 28th)

Found a way to load a default grammar. It goes something like this:

sre.LoadGrammar(New DictationGrammar)

There are still problems here. The recognition is not exact. The output is rubbish. For a 6min file it gives probably 5-6 words of text totally irrelevant to the voice file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

乖乖公主 2024-08-18 20:52:01

System.Speech 中的类用于文本到语音(主要是辅助功能)。

您正在寻找语音识别。有 System.Speech.Recognition 命名空间可用.Net 3.0。它使用 Windows 桌面语音引擎。这可能会让您入门,但我想还有更好的引擎。

语音识别非常复杂且很难正确完成,也有一些商业产品可用。

The classes in System.Speech are for text to speech (primarily an acessibility feature).

You are looking for voice recognition. There is the System.Speech.Recognition namespace available since .Net 3.0. It uses the Windows Desktop Speech engine. This might get you started, but I guess there are better engines out there.

Voice recognition is very complicated and hard to do right, there are also some commercial products available.

梦里°也失望 2024-08-18 20:52:01

我意识到这是一个老问题,但在后面的问题和答案中可以找到更好的信息。例如,请参阅 在 ASP.NET Web 应用程序中将语音转录为文本的最佳选项是什么?

您可以调用 SetInputToWaveFile() 来读取音频文件,而不是调用 SetInputToDefaultAudioDevice()。

Windows Vista 和 Windows 7 中的桌面识别引擎包含听写语法,如引用的答案中所示。

I realize this is an old question, but there is better information available in later questions and answers. For example see What is the best option for transcribing speech-to-text in a asp.net web app?

Instead of calling SetInputToDefaultAudioDevice() you can call SetInputToWaveFile() to read from an audio file.

The desktop recognition engine that comes in Windows Vista and Windows 7 includes a dictation grammar as shown in the referenced answer.

握住你手 2024-08-18 20:52:01

您应该使用 SpeechRecognitionEngine 。要使用波形文件,请调用 SetInputToWaveFile。我希望我能帮助你更多,但我不是专家。

哦,如果你的词真的是 triskaidekaphobia,我认为即使是人类语音识别引擎也无法识别......

You should use the SpeechRecognitionEngine. To use a wave file, call SetInputToWaveFile. I wish I could help you more, but I'm no expert.

Oh, and if your word is really triskaidekaphobia, I don't think even a human speech recognition engine would recognize that...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文