Tensorflow 数据集不会将张量输出到 GPU 内存

发布于 2025-01-11 07:39:34 字数 1749 浏览 0 评论 0原文

我有一个 Numpy 数组列表,arr_list。因为我的数组都是不同的形状,所以我尝试使用 tf.data.Dataset.from_generator 函数使用此列表创建 Tensorflow 数据集。

这是我的生成器:

def generator_func():
    for (a, b), c in arr_list:
        a = tf.cast(a, dtype=tf.uint16)
        b = tf.cast(b, dtype=tf.float16)
        c = tf.cast(c, dtype=tf.float32)
        yield (a, b), c

对于上下文,ab 是模型的输入,c 是输出。

当我运行生成器时,一切正常,张量似乎按预期存在于 GPU 上,

gen = generator_func()
(a, b), c = next(gen)

print(a.device)
print(b.device)
print(c.device)

"""
Output:
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0

但是当我尝试使用数据集时,这会失败:

data_signature = (
    (tf.TensorSpec(shape=(1, None), dtype=tf.uint16),
     tf.TensorSpec(shape=(1, None, 300), dtype=tf.float16)),
    tf.TensorSpec(shape=(1, None), dtype=tf.float32),
)

train = tf.data.Dataset.from_generator(generator_func, output_signature=data_signature)

for (a, b), c in train.take(1):
    print(a.device)
    print(b.device)
    print(c.device)

"""
Output:
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0

当然,使用 Python 生成器时设备范围可能会发生变化。但是,使用 copy_to_device 方法并不能解决该问题。

train = train.apply(tf.data.experimental.copy_to_device("/gpu:0"))

我也尝试使用 prefetch_to_device 方法,但没有成功。


我将如何调试这个?为什么从 Tensorflow 数据集中出来的张量不会进入 GPU 内存?

I have a list of Numpy arrays, arr_list. Because my arrays are all different shapes, I am trying to use the tf.data.Dataset.from_generator function to create a Tensorflow dataset using this list.

Here's my generator:

def generator_func():
    for (a, b), c in arr_list:
        a = tf.cast(a, dtype=tf.uint16)
        b = tf.cast(b, dtype=tf.float16)
        c = tf.cast(c, dtype=tf.float32)
        yield (a, b), c

For context, a and b are the inputs to my model, and c is the output.

When I run the generator, everything works well and the tensors seem to live on the GPU as expected,

gen = generator_func()
(a, b), c = next(gen)

print(a.device)
print(b.device)
print(c.device)

"""
Output:
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:GPU:0

But this fails when I try to use datasets:

data_signature = (
    (tf.TensorSpec(shape=(1, None), dtype=tf.uint16),
     tf.TensorSpec(shape=(1, None, 300), dtype=tf.float16)),
    tf.TensorSpec(shape=(1, None), dtype=tf.float32),
)

train = tf.data.Dataset.from_generator(generator_func, output_signature=data_signature)

for (a, b), c in train.take(1):
    print(a.device)
    print(b.device)
    print(c.device)

"""
Output:
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:CPU:0

Sure, maybe the device scope changes when using Python generators. However, using the copy_to_device method does not fix the issue.

train = train.apply(tf.data.experimental.copy_to_device("/gpu:0"))

I have also unsuccessfully tried using the prefetch_to_device method.


How would I go about debugging this? Why won't the tensors coming out of the Tensorflow Dataset go to the GPU memory?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文