使用Web Media Recorder API在前端进行实时语音识别和后端的Python

发布于 2025-01-31 08:31:37 字数 2130 浏览 4 评论 0原文

我们要实施什么?

我们部署了一个AI模型来从麦克风流式传输音频,并将语音文本显示给用户。 this

使用了哪些技术?

  • python用于后端和AI模型的
  • 前端
  • Web媒体录音机API对录制和配置Audio
  • Websocket以连接到AI API的

情况是什么问题?

在前端,我尝试每秒将音频块作为INT16Array发送到后端。还要确保与麦克风和音频相关的所有内容都可以正常工作,在停止录制后,我只能以1s的持续时间下载音频的第一部分,这很明显。但是,当音频被打磨到后端时,它会变成一堆噪音!

这是录音得到处理时的React代码的一部分:

        useEffect(()=> {
      if (recorder === null) {
        if (isRecording) {
          requestRecorder().then(setRecorder, console.error);
        } else {
          return;
        }
      }
  
      // Manage recorder state.
      if (isRecording && recorder) {
        recorder.start();
      } else if (!isRecording && recorder) {
        recorder.stop();
      }
 
    // send the data every second
    const ineterval = setInterval(() => {
      if (recorder) {
        recorder.requestData();
      }
      }, 1000);

    // Obtain the audio when ready.
    const handleData = e => {
      setAudioURL(URL.createObjectURL(e.data));
      let audioData = []
      audioData.push(e.data)
      const audioBlob = new Blob(audioData, {'type' : 'audio/wav; codecs=0' })
      
      const instanceOfFileReader = new FileReader();
      instanceOfFileReader.readAsArrayBuffer(audioBlob);


      instanceOfFileReader.addEventListener("loadend", (event) => {
      console.log(event.target.result.byteLength);
      const arrayBuf = event.target.result
      const int16ArrNew = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));

            
      setJsonData(prevstate => ({...prevstate, 
      matrix: int16ArrNew,}))
      })

    };
    if (recorder) {
      recorder.addEventListener("dataavailable", handleData);
    }
    return () => {
      if (recorder) {
        recorder.removeEventListener("dataavailable", handleData)
        clearInterval(ineterval)
      }
  };
    }, [recorder, isRecording])

以前有人遇到过此问题吗?对此进行了大量研究,但没有发现任何解决问题。

What we're trying to implement?

we deployed an AI model to stream the audio from microphone and display the text of the speech to the user. something like this.

What technologies are used?

  • Python for back-end and the AI model
  • React for front-end
  • web Media Recorder API to record and configure the audio
  • WebSocket to get connected to the AI API

What's the problem though?

In the front-end, I try to send audio chunks every second as an Int16Array to the back-end. also to make sure everything related to the mic and audio works fine, after stop recording I can download the first chunk of the audio only with duration of 1s which is pretty clear. However, when the audio is sanded to the backend it becomes to some bunch of noise!

Here's the part of the React code when the recording is getting handle:

        useEffect(()=> {
      if (recorder === null) {
        if (isRecording) {
          requestRecorder().then(setRecorder, console.error);
        } else {
          return;
        }
      }
  
      // Manage recorder state.
      if (isRecording && recorder) {
        recorder.start();
      } else if (!isRecording && recorder) {
        recorder.stop();
      }
 
    // send the data every second
    const ineterval = setInterval(() => {
      if (recorder) {
        recorder.requestData();
      }
      }, 1000);

    // Obtain the audio when ready.
    const handleData = e => {
      setAudioURL(URL.createObjectURL(e.data));
      let audioData = []
      audioData.push(e.data)
      const audioBlob = new Blob(audioData, {'type' : 'audio/wav; codecs=0' })
      
      const instanceOfFileReader = new FileReader();
      instanceOfFileReader.readAsArrayBuffer(audioBlob);


      instanceOfFileReader.addEventListener("loadend", (event) => {
      console.log(event.target.result.byteLength);
      const arrayBuf = event.target.result
      const int16ArrNew = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));

            
      setJsonData(prevstate => ({...prevstate, 
      matrix: int16ArrNew,}))
      })

    };
    if (recorder) {
      recorder.addEventListener("dataavailable", handleData);
    }
    return () => {
      if (recorder) {
        recorder.removeEventListener("dataavailable", handleData)
        clearInterval(ineterval)
      }
  };
    }, [recorder, isRecording])

Is there anyone faced this issue before? had a lot of research about it but found nothing to fix this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

九公里浅绿 2025-02-07 08:31:37

刚刚检查了这个问题并笑了笑:))。年。这年对我来说是一场真正的噩梦:))。因此,只需将图书馆的名称放在将来会看到它的任何人。
无论您需要什么WebRTC,要实现实时过渡。对于实时记录,您可以简单地使用RecordRTC软件包并使用NPM安装它。配置不多,并且完全直接。

Just checked this question and smiled:)) ..last year this was a real nightmare for me :))..so just put the name of the library for anyone who will see it in the future.
To achieve real-time transition no matter what you will need webRTC. For real-time recording you can simply use the recordRTC package and install it using npm. There isn't many configurations and it's completely straightforward.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文