如何启用混合精确培训

发布于 2025-01-29 20:53:39 字数 1130 浏览 2 评论 0原文

我正在尝试在 vs代码上训练一个深度学习模型，因此我想为此使用 gpu 。我有 cuda 11.6 ， nvidia geforce gtx 1650 ， tensorflow gpu == 2.5.0 和 pip版本21.2.3 对于 Windows 10 。问题是，每当我运行此部分代码时，我都会得到此错误：使用AMP或APEX（-FP16或）的混合精度培训 - BF16）和半精度评估（-fp16_full_eval或-bf16_full_eval）只能在CUDA设备上使用。

from transformers import TrainingArguments

training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)

我还测试了TensorFlow是否可以访问GPU以及是否使用 tf.config.list_physical_devices（'gpu'）和 tf.test.is_is_built_with_with_with_cuda（）返回 true 。如何解决这个问题？为什么我会遇到这个错误？任何想法！

原文

i'm trying to train a deep learning model on vs code so i would like to use the GPU for that. I have cuda 11.6 , nvidia GeForce GTX 1650, TensorFlow-gpu==2.5.0 and pip version 21.2.3 for windows 10. The problem is whenever i run this part of code i've got this error : Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices.

from transformers import TrainingArguments

training_args = TrainingArguments(
output_dir=new_output_models_dir,
#output_dir="dev/",
group_by_length=True,
per_device_train_batch_size=16,
gradient_accumulation_steps=2,
#dataloader_num_workers = 1,
dataloader_num_workers = 0,
evaluation_strategy="steps",
num_train_epochs=40,
fp16=True,
save_steps=400,
eval_steps=400,
logging_steps=400,
learning_rate=1e-4,
warmup_steps=500,
save_total_limit=2,
)

I've also tested whether tensorflow can access a gpu and whether tensorflow was built with cuda gpu support using tf.config.list_physical_devices('GPU') and tf.test.is_built_with_cuda() and both of them return TRUE . How to slove this issue ? and why i'm getting this error ? Any ideas !

分享到QQ

分享到微博