如何使用本地自定义数据集训练WAV2VEC2 XLSR

发布于 2025-01-30 04:49:43 字数 1457 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

橘香 2025-02-06 04:49:43

我建议您使用自己的数据集扩展通用语音(CV)丹麦子集。首先分析数据集并将您的数据像CV语料库一样。此时:数据扩展(.wav,.mp3 ...),type(float32,int ...),音频长度,当然还有转录格式很重要。不要让您的语料库稀疏。

将数据放入CV Copus文件夹和加载数据集中。然后,您应该能够使用现有代码对扩展数据进行微调模型。

如果您不是WAV2VEC的专家,请不要创建全新的语料库。

注意:您应该使用更少的数据获得合理的结果。您实现了什么,目标是什么。超参数调整可能是您寻找的第一件事,而不是数据。

I suggest you to extend Common Voice (CV) Danish subset with your own dataset. Analyse dataset first and make your data like CV corpus. At this point: data extension (.wav, .mp3 ...), type (float32, int ...), audio lengths and of course transcription formats are important. Don not make your corpus sparse.

Place you data into CV corpus folder and load dataset. Then you should be able to fine-tune model with extended data using existing code.

Do not create completely new corpus If you are not an expert of wav2vec.

A Note: You should get reasonable result using less data. What WER did you achieve and what is your target. Hyper-parameter tuning may be the first thing you look for instead of data.

海拔太高太耀眼 2025-02-06 04:49:43

我已经建立了一个工具,可以帮助我使用自定义数据微调WAV2VEC2模型。也许这也可以为您提供帮助: https://github.com/jonatasgrosman/huggingsound

您可以使用:pip install huggingsound安装它

来使用自定义数据集微调XLSR模型,您需要做类似的事情:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your custom train data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set,
)

I've built a tool to help me to fine-tune wav2vec2 models using custom data. Maybe this can help you too: https://github.com/jonatasgrosman/huggingsound.

You can install it using: pip install huggingsound

To fine-tune the XLSR model using a custom dataset, you'll need to do something like this:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your custom train data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set,
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文