如何在Python中拆分wav文件中的单词？

发布于 2025-01-09 17:53:41 字数 67 浏览 1 评论 0原文

例如 wav 文件是（“你好吗？”）我想分割 3 个 wav 文件，如（“如何”），（“是”），（“你”）。你能帮我吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃酥萝莉 2025-01-16 17:53:41

你可以尝试这个，但是你需要知道每个单词的时间戳，除非你使用机器学习。如果您知道每个单词末尾的时间戳，例如，在 1 之后是 1，在 2 之后是 2，在 2.7 之后是 2.7，请尝试此操作。

from pydub import AudioSegment

stamps = [1,2,2.7] #timestamps at end of each word

originalAudio = AudioSegment.from_wav("audiofile.wav")
start = 0
counter = 1
for current in stamps:
    newAudio = originalAudio[start*1000: current*1000] #in milliseconds 
    newAudio.export(f'{counter}-word.wav', format="wav") 
    counter += 1
    start = current

文件将保存为 1-word.wav、2-word.wav 等。

如果您不知道时间，您可以尝试此代码，它会监听静音或暂停。

from pydub import AudioSegment
from pydub.silence import split_on_silence

sound_file = AudioSegment.from_wav("sentence.wav")
audio_chunks = split_on_silence(sound_file, 
    # must be silent for at least half a second
    # make it shorter if the pause is short, like 100-250ms
    min_silence_len=500,

    # consider it silent if quieter than -16 dBFS
    silence_thresh=16
)

for i, chunk in enumerate(audio_chunks):

    out_file = f"chunk{i}.wav"
    print ("exporting", out_file)
    chunk.export(out_file, format="wav")

然而，听完音频文件后，背景噪音很大，单词之间没有明显的停顿。

然后你说你想将音频翻译成文本，试试这个代码。

首先安装所需的库。

pip3 install SpeechRecognition pydub

然后运行这个，

import speech_recognition as sr
r = sr.Recognizer()

audiofile = 'demo55.wav'

with sr.AudioFile(audiofile) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text) tr is code for turkish
    text = r.recognize_google(audio_data, language="tr-tr")
    print(text)

我得到输出，

"Merhaba benim adım Ezgi"

不确定这是否正确，因为我不会说土耳其语，但我听了它，听起来是正确的。

You can try this, but you need to know the timestamp of each word, unless you use machine learning. If you know the time stamps at the end of each word, for example, after how it's 1, after are its 2 and after you its 2.7, try this.

from pydub import AudioSegment

stamps = [1,2,2.7] #timestamps at end of each word

originalAudio = AudioSegment.from_wav("audiofile.wav")
start = 0
counter = 1
for current in stamps:
    newAudio = originalAudio[start*1000: current*1000] #in milliseconds 
    newAudio.export(f'{counter}-word.wav', format="wav") 
    counter += 1
    start = current

The files will be saved as 1-word.wav, 2-word.wav etc.

If you don’t know the timings, you can try this code, it listens for silences or pauses.

from pydub import AudioSegment
from pydub.silence import split_on_silence

sound_file = AudioSegment.from_wav("sentence.wav")
audio_chunks = split_on_silence(sound_file, 
    # must be silent for at least half a second
    # make it shorter if the pause is short, like 100-250ms
    min_silence_len=500,

    # consider it silent if quieter than -16 dBFS
    silence_thresh=16
)

for i, chunk in enumerate(audio_chunks):

    out_file = f"chunk{i}.wav"
    print ("exporting", out_file)
    chunk.export(out_file, format="wav")

However, after listening to the audio file, there is a lot of background noise and no clear pauses between words.

You then said that you want to translate the audio to text, try this code.

First install the required libraries.

pip3 install SpeechRecognition pydub

Then run this,

import speech_recognition as sr
r = sr.Recognizer()

audiofile = 'demo55.wav'

with sr.AudioFile(audiofile) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text) tr is code for turkish
    text = r.recognize_google(audio_data, language="tr-tr")
    print(text)

I get the output,