使用Web Media Recorder API在前端进行实时语音识别和后端的Python

发布于 2025-01-31 08:31:37 字数 2130 浏览 4 评论 0原文

我们要实施什么？

我们部署了一个AI模型来从麦克风流式传输音频，并将语音文本显示给用户。 this 。

使用了哪些技术？

python用于后端和AI模型的
前端
Web媒体录音机API对录制和配置Audio
Websocket以连接到AI API的

情况是什么问题？

在前端，我尝试每秒将音频块作为INT16Array发送到后端。还要确保与麦克风和音频相关的所有内容都可以正常工作，在停止录制后，我只能以1s的持续时间下载音频的第一部分，这很明显。但是，当音频被打磨到后端时，它会变成一堆噪音！

这是录音得到处理时的React代码的一部分：

        useEffect(()=> {
      if (recorder === null) {
        if (isRecording) {
          requestRecorder().then(setRecorder, console.error);
        } else {
          return;
        }
      }
  
      // Manage recorder state.
      if (isRecording && recorder) {
        recorder.start();
      } else if (!isRecording && recorder) {
        recorder.stop();
      }
 
    // send the data every second
    const ineterval = setInterval(() => {
      if (recorder) {
        recorder.requestData();
      }
      }, 1000);

    // Obtain the audio when ready.
    const handleData = e => {
      setAudioURL(URL.createObjectURL(e.data));
      let audioData = []
      audioData.push(e.data)
      const audioBlob = new Blob(audioData, {'type' : 'audio/wav; codecs=0' })
      
      const instanceOfFileReader = new FileReader();
      instanceOfFileReader.readAsArrayBuffer(audioBlob);


      instanceOfFileReader.addEventListener("loadend", (event) => {
      console.log(event.target.result.byteLength);
      const arrayBuf = event.target.result
      const int16ArrNew = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));

            
      setJsonData(prevstate => ({...prevstate, 
      matrix: int16ArrNew,}))
      })

    };
    if (recorder) {
      recorder.addEventListener("dataavailable", handleData);
    }
    return () => {
      if (recorder) {
        recorder.removeEventListener("dataavailable", handleData)
        clearInterval(ineterval)
      }
  };
    }, [recorder, isRecording])

以前有人遇到过此问题吗？对此进行了大量研究，但没有发现任何解决问题。

原文

What we're trying to implement?

we deployed an AI model to stream the audio from microphone and display the text of the speech to the user. something like this.

What technologies are used?

Python for back-end and the AI model
React for front-end
web Media Recorder API to record and configure the audio
WebSocket to get connected to the AI API

What's the problem though?

In the front-end, I try to send audio chunks every second as an Int16Array to the back-end. also to make sure everything related to the mic and audio works fine, after stop recording I can download the first chunk of the audio only with duration of 1s which is pretty clear. However, when the audio is sanded to the backend it becomes to some bunch of noise!

Here's the part of the React code when the recording is getting handle:

        useEffect(()=> {
      if (recorder === null) {
        if (isRecording) {
          requestRecorder().then(setRecorder, console.error);
        } else {
          return;
        }
      }
  
      // Manage recorder state.
      if (isRecording && recorder) {
        recorder.start();
      } else if (!isRecording && recorder) {
        recorder.stop();
      }
 
    // send the data every second
    const ineterval = setInterval(() => {
      if (recorder) {
        recorder.requestData();
      }
      }, 1000);

    // Obtain the audio when ready.
    const handleData = e => {
      setAudioURL(URL.createObjectURL(e.data));
      let audioData = []
      audioData.push(e.data)
      const audioBlob = new Blob(audioData, {'type' : 'audio/wav; codecs=0' })
      
      const instanceOfFileReader = new FileReader();
      instanceOfFileReader.readAsArrayBuffer(audioBlob);


      instanceOfFileReader.addEventListener("loadend", (event) => {
      console.log(event.target.result.byteLength);
      const arrayBuf = event.target.result
      const int16ArrNew = new Int16Array(arrayBuf, 0, Math.floor(arrayBuf.byteLength / 2));

            
      setJsonData(prevstate => ({...prevstate, 
      matrix: int16ArrNew,}))
      })

    };
    if (recorder) {
      recorder.addEventListener("dataavailable", handleData);
    }
    return () => {
      if (recorder) {
        recorder.removeEventListener("dataavailable", handleData)
        clearInterval(ineterval)
      }
  };
    }, [recorder, isRecording])

Is there anyone faced this issue before? had a lot of research about it but found nothing to fix this.

分享到QQ

分享到微博