微调 GPT2 时 CUDA 内存不足

发布于 01-15 07:45 字数 757 浏览 7 评论 0原文

运行时错误:CUDA 内存不足。尝试分配 144.00 MiB(GPU 0;11.17 GiB 总容量;10.49 GiB 已分配;13.81 MiB 空闲;PyTorch 总共保留 10.56 GiB)分配的内存尝试设置 max_split_size_mb 以避免碎片。请参阅内存管理和 PYTORCH_CUDA_ALLOC_CONF 的文档

这是我收到的错误,我尝试过调整批量大小但无济于事。我正在 google colab 上进行培训。

这是与错误有关的代码段:

training_args = TrainingArguments(
output_dir="/content/",
num_train_epochs=EPOCHS,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
# gradient_accumulation_steps=BATCH_UPDATE,
evaluation_strategy="epoch",
save_strategy='epoch',
fp16=True,
fp16_opt_level=APEX_OPT_LEVEL,
warmup_steps=WARMUP_STEPS,    
learning_rate=LR,
adam_epsilon=EPS,
weight_decay=0.01,        
save_total_limit=1,
load_best_model_at_end=True,     
)

有解决方案吗?

RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 11.17 GiB total capacity; 10.49 GiB already allocated; 13.81 MiB free; 10.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This is the error I am getting, I have tried playing around with batch size but to no avail. I am training on google colab.

This is the piece of code concerned with the error:

training_args = TrainingArguments(
output_dir="/content/",
num_train_epochs=EPOCHS,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
# gradient_accumulation_steps=BATCH_UPDATE,
evaluation_strategy="epoch",
save_strategy='epoch',
fp16=True,
fp16_opt_level=APEX_OPT_LEVEL,
warmup_steps=WARMUP_STEPS,    
learning_rate=LR,
adam_epsilon=EPS,
weight_decay=0.01,        
save_total_limit=1,
load_best_model_at_end=True,     
)

Any solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

累赘2025-01-22 07:45:57

您使用哪种型号?只是 Huggingface 的标准 gpt-2 吗?我之前在自己的 GPU 上微调过该模型,该 GPU 只有 6GB,并且能够毫无问题地使用 batch_size 8。

我会尝试以下每一项:

  1. 减少 batch_size - 您已经尝试过了,您是否将其一直更改为 batch_size 为 1?即使那样也会出现问题吗?
  2. 我假设您已经在 Colab 中激活了 GPU。分配给您的 GPU 有点随机。根据我在免费版本中的经验,您通常会得到 Tesla T4 (16GB) 或 Tesla K80 (24GB) 之类的东西。使用 !nvidia-smi -L 查看分配给您的 GPU。如果您看到您的型号小于 24GB,请将“笔记本设置”设置为“无”,然后再次设置为 GPU 以获取新的型号。或管理会话 ->终止会话然后重新分配。多尝试几次,直到获得好的 GPU。因为您的代码可能不适用于 16GB 或更少的内存,但可能仅适用于 24GB。一般来说,清除资源可能是一个好主意,以防已经加载的大型内容首先导致此问题。
  3. 虽然我不是这方面的专家:我不确定在不知道分配了哪个 GPU 的情况下仅使用 fp16 是否是一个好主意。据我所知,像 K80 这样的一些 GPU 本身并不支持它(同样,你可能比我更了解这一点),这意味着它只会导致训练期间基本上有一半的资源被浪费。如果您不知道 fp16 意味着将浮点精度从 32 设置为 16,因此可以使用不太精确的浮点数表示形式将相同资源的数量加倍(前提是 GPU 支持)。
  4. 尝试 distilgpt2 这是一个具有几乎相同性能的精炼模型。

Which model are you using? Just the standard gpt-2 from huggingface? I fine-tuned that model before on my own GPU which has only 6GB and was able to use batch_size of 8 without a problem.

I would try each of the following:

  1. Reduce the batch_size - you already tried it, did you change it all the way down to a batch_size of 1? Does the problem occur even then?
  2. I assume you already activated GPU in Colab. The GPU assigned to you is a bit random. From my experience in the free version you usually either get something like a Tesla T4 (16GB) or a Tesla K80 (24GB). Use !nvidia-smi -L to see which GPU was allocated to you. If you should see that you got a model with less than 24GB, turn Notebook-Settings to None, then to GPU again to get a new one. Or Manage Sessions -> Terminate Sessions then Reallocate. Try a few times until you get a good GPU. Since your code might not work with 16GB or less but might just work with 24GB. Generally clearing your ressources might be a good idea in case there is something large already loaded causing this problem in the first place.
  3. Although I am not an expert at this: I am not sure if fp16 is a good idea to just use without knowing which GPU you got allocated. From what I've heard some GPUs like the K80 do not support it natively (again, you might know more about this than me), meaning it will just result in basically half the ressources going to waste during training. In case you don't know fp16 means to set floating precision down to 16 from 32, so to use a less precise float number representation to get double the amount into the same ressources (only IF the GPU supports it though).
  4. Try distilgpt2 which is a distilled model with almost the same performance.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文