如何使用 GradientTape 计算张量中每个元素的梯度？

发布于 2025-01-12 04:19:50 字数 3098 浏览 6 评论 0原文

我想计算张量中每个元素相对于监视张量列表的梯度。

当我直接在 y 上使用 GradientTape 的 gradient() 时，生成的 dy_dx 具有我的 x 的尺寸。例如：结果

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)

print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)

dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)

print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)

：

---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]

请注意，张量和列表版本的结果与观看的 x 具有相同的维度。

当我尝试为每个元素调用磁带的梯度方法时，我想要列表，但对于张量，所有梯度都为零：

dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)

yields:

dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

The dy_dx_from_list_elements 值就是我正在寻找的值for 但我真的希望能够从张量中获取它们，因为我的现实世界模型将 y 值输出为张量。

任何关于如何为张量中的每个元素生成梯度的建议将不胜感激！

原文

I would like to calculate the gradient of each element in a tensor with respect to a list of watched tensors.

When I use GradientTape's gradient() on y directly, the resulting dy_dx has the dimension of my x. For example:

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)

print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)

dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)

print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)

results in:

---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]

Note both the tensor and the list versions' result have the same dimension as the watched x.

When I try to call the tape's gradient method for each element, I get want I want for the list but for the tensor all gradients are zero:

dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)

yeilds:

dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

The dy_dx_from_list_elements values are what I am looking for but I would really like to be able to get them from the tensor because my real world model outputs the y values as a tensor.

Any suggestion to how I could generate gradients for every element in a tensor would be much appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空名 2025-01-19 04:19:50

我认为问题出在张量的迭代上。 tf.unstack 或类似操作可能在内部运行，并且所有 tf 操作都需要在渐变带的范围内才能考虑在内。仅计算一个张量相对于计算中涉及的另一个张量的梯度。几个例子：

import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.unstack(y_as_tensor)


dy_dx_from_tensor_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in t]
dy_dx_from_list_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)

---------------------------
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

当您使用 tf.split 时，同样适用：

import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.split(y_as_tensor, 2)

根据文档：

如果计算退出，磁带无法记录梯度路径
TensorFlow。

此外，tf.stack 通常是不可微分的。

I think the problem is coming from iterating over a tensor. A tf.unstack or similar operation might be running internally and all tf operations need to be within the scope of the gradient tape for them to be taken into account. Gradients will be calculated only for a tensor in relation to another tensor that was involved in its calculation. A couple of examples:

import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.unstack(y_as_tensor)


dy_dx_from_tensor_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in t]
dy_dx_from_list_elements = [g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list]

print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)

---------------------------
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]

The same applies when you for example use tf.split:

import tensorflow as tf

x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
    y_as_tensor = tf.stack(y_as_list, axis=0)
    t = tf.split(y_as_tensor, 2)

According to the docs: