Tensorflow 数据集不会将张量输出到 GPU 内存
我有一个 Numpy 数组列表,arr_list
。因为我的数组都是不同的形状,所以我尝试使用 tf.data.Dataset.from_generator 函数使用此列表创建 Tensorflow 数据集。
这是我的生成器:
def generator_func():
for (a, b), c in arr_list:
a = tf.cast(a, dtype=tf.uint16)
b = tf.cast(b, dtype=tf.float16)
c = tf.cast(c, dtype=tf.float32)
yield (a, b), c
对于上下文,a
和 b
是模型的输入,c
是输出。
当我运行生成器时,一切正常,张量似乎按预期存在于 GPU 上,
gen = generator_func()
(a, b), c = next(gen)
print(a.device)
print(b.device)
print(c.device)
"""
Output:
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
但是当我尝试使用数据集时,这会失败:
data_signature = (
(tf.TensorSpec(shape=(1, None), dtype=tf.uint16),
tf.TensorSpec(shape=(1, None, 300), dtype=tf.float16)),
tf.TensorSpec(shape=(1, None), dtype=tf.float32),
)
train = tf.data.Dataset.from_generator(generator_func, output_signature=data_signature)
for (a, b), c in train.take(1):
print(a.device)
print(b.device)
print(c.device)
"""
Output:
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
当然,使用 Python 生成器时设备范围可能会发生变化。但是,使用 copy_to_device
方法并不能解决该问题。
train = train.apply(tf.data.experimental.copy_to_device("/gpu:0"))
我也尝试使用 prefetch_to_device
方法,但没有成功。
我将如何调试这个?为什么从 Tensorflow 数据集中出来的张量不会进入 GPU 内存?
I have a list of Numpy arrays, arr_list
. Because my arrays are all different shapes, I am trying to use the tf.data.Dataset.from_generator
function to create a Tensorflow dataset using this list.
Here's my generator:
def generator_func():
for (a, b), c in arr_list:
a = tf.cast(a, dtype=tf.uint16)
b = tf.cast(b, dtype=tf.float16)
c = tf.cast(c, dtype=tf.float32)
yield (a, b), c
For context, a
and b
are the inputs to my model, and c
is the output.
When I run the generator, everything works well and the tensors seem to live on the GPU as expected,
gen = generator_func()
(a, b), c = next(gen)
print(a.device)
print(b.device)
print(c.device)
"""
Output:
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
But this fails when I try to use datasets:
data_signature = (
(tf.TensorSpec(shape=(1, None), dtype=tf.uint16),
tf.TensorSpec(shape=(1, None, 300), dtype=tf.float16)),
tf.TensorSpec(shape=(1, None), dtype=tf.float32),
)
train = tf.data.Dataset.from_generator(generator_func, output_signature=data_signature)
for (a, b), c in train.take(1):
print(a.device)
print(b.device)
print(c.device)
"""
Output:
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
Sure, maybe the device scope changes when using Python generators. However, using the copy_to_device
method does not fix the issue.
train = train.apply(tf.data.experimental.copy_to_device("/gpu:0"))
I have also unsuccessfully tried using the prefetch_to_device
method.
How would I go about debugging this? Why won't the tensors coming out of the Tensorflow Dataset go to the GPU memory?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论