如何启用混合精确培训
我正在尝试在 vs代码上训练一个深度学习模型,因此我想为此使用 gpu 。我有 cuda 11.6 , nvidia geforce gtx 1650 , tensorflow gpu == 2.5.0 和 pip版本21.2.3 对于 Windows 10 。问题是,每当我运行此部分代码时,我都会得到此错误:使用AMP或APEX(-FP16
或)的混合精度培训 - BF16
)和半精度评估(-fp16_full_eval
或-bf16_full_eval
)只能在CUDA设备上使用。
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)
我还测试了TensorFlow是否可以访问GPU以及是否使用 tf.config.list_physical_devices('gpu')和 tf.test.is_is_built_with_with_with_cuda()返回 true 。如何解决这个问题?为什么我会遇到这个错误?任何想法!
i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. The problem is whenever i run this part of code i've got this error : Mixed precision training with AMP or APEX (--fp16
or --bf16
) and half precision evaluation (--fp16_full_eval
or --bf16_full_eval
) can only be used on CUDA devices.
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)
I've also tested whether tensorflow can access a gpu and whether tensorflow was built with cuda gpu support using tf.config.list_physical_devices('GPU') and tf.test.is_built_with_cuda() and both of them return TRUE . How to slove this issue ? and why i'm getting this error ? Any ideas !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
以上错误表明它不接受fp16 = true/bf16 =在非GPU模式下true。也许CUDA 11.6可能是存在稳定问题的问题。
使用CUDA 11.2和CUDNN 8.1进行测试。如果那不起作用,则可以使用fp16 = false parametre。
ref - https://www.tensorflow.org/install/install/source#gpu
The above error suggests that it does not accept fp16=True/bf16=True in non-GPU mode. Perhaps Cuda 11.6 might be an issue here which has stability issues.
Test with Cuda 11.2 and CudNN 8.1 . If that does not work you can go with fp16=False parametre.
Ref - https://www.tensorflow.org/install/source#gpu